Commit graph

746 commits

Author SHA1 Message Date
David Sterba
cc8385b59e btrfs: preallocate radix tree node for readahead
We can preallocate the node so insertion does not have to do that under
the lock. The GFP flags for the per-device radix tree are initialized to
 GFP_NOFS & ~__GFP_DIRECT_RECLAIM
but we can use GFP_KERNEL, same as an allocation above anyway, but also
because readahead is optional and not on any critical writeout path.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo
539b50d2f6 Btrfs: convert BUG_ON to WARN_ON
These two BUG_ON()s would never be true, ensured by callers' logic.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo
2b19a1fef7 Btrfs: helper for ops that requires full stripe
This adds a helper to show directly whether ops require full stripe.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo
6fad823f49 Btrfs: do not add extra mirror when dev_replace target dev is not available
With this, we can avoid allocating memory for dev replace copies if the
target dev is not available.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:25 +02:00
Liu Bo
73c0f22825 Btrfs: handle operations for device replace separately
Since this part is mostly independent, this moves it to a separate
function.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo
5ab56090b8 Btrfs: introduce a function to get extra mirror from replace
As the part of getting extra mirror in __btrfs_map_block is
independent, this puts it into a separate function.

Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo
0b3d4cd371 Btrfs: separate DISCARD from __btrfs_map_block
Since DISCARD is not as important as an operation like write, we don't
copy it to target device during replace, and it makes __btrfs_map_block
less complex.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Liu Bo
592d92eeab Btrfs: create a helper for getting chunk map
We have similar code here and there, this merges them into a helper.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:24 +02:00
Elena Reshetova
490b54d6fb btrfs: convert extent_map.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Elena Reshetova
140475ae4a btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Adam Borowski
1450612797 btrfs: fix a bogus warning when converting only data or metadata
If your filesystem has, eg, data:raid0 metadata:raid1, and you run "btrfs
balance -dconvert=raid1", the meta.target field will be uninitialized.
That's otherwise ok, as it's unused except for this warning.

Thus, let's use the existing set of raid levels for the comparison.

As a side effect, non-convert balances will now nag about data>metadata.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-18 14:07:23 +02:00
Linus Torvalds
4b31ac485d Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Dave Sterba collected a few more fixes for the last rc.

  These aren't marked for stable, but I'm putting them in with a batch
  were testing/sending by hand for this release"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix potential use-after-free for cloned bio
  Btrfs: fix segmentation fault when doing dio read
  Btrfs: fix invalid dereference in btrfs_retry_endio
  btrfs: drop the nossd flag when remounting with -o ssd
2017-04-14 16:53:45 -07:00
Liu Bo
a967efb30b Btrfs: fix potential use-after-free for cloned bio
KASAN reports that there is a use-after-free case of bio in btrfs_map_bio.

If we need to submit IOs to several disks at a time, the original bio
would get cloned and mapped to the destination disk, but we really should
use the original bio instead of a cloned bio to do the sanity check
because cloned bios are likely to be freed by its endio.

Reported-by: Diego <diegocg@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-11 18:49:56 +02:00
Linus Torvalds
bbe08c0a43 Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
 "Btrfs round two.

  These are mostly a continuation of Dave Sterba's collection of
  cleanups, but Filipe also has some bug fixes and performance
  improvements"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (69 commits)
  btrfs: add dummy callback for readpage_io_failed and drop checks
  btrfs: drop checks for mandatory extent_io_ops callbacks
  btrfs: document existence of extent_io ops callbacks
  btrfs: let writepage_end_io_hook return void
  btrfs: do proper error handling in btrfs_insert_xattr_item
  btrfs: handle allocation error in update_dev_stat_item
  btrfs: remove BUG_ON from __tree_mod_log_insert
  btrfs: derive maximum output size in the compression implementation
  btrfs: use predefined limits for calculating maximum number of pages for compression
  btrfs: export compression buffer limits in a header
  btrfs: merge nr_pages input and output parameter in compress_pages
  btrfs: merge length input and output parameter in compress_pages
  btrfs: constify name of subvolume in creation helpers
  btrfs: constify buffers used by compression helpers
  btrfs: constify input buffer of btrfs_csum_data
  btrfs: constify device path passed to relevant helpers
  btrfs: make btrfs_inode_resume_unlocked_dio take btrfs_inode
  btrfs: make btrfs_inode_block_unlocked_dio take btrfs_inode
  btrfs: Make btrfs_add_nondir take btrfs_inode
  btrfs: Make btrfs_add_link take btrfs_inode
  ...
2017-03-02 16:03:00 -08:00
David Sterba
fa2529923d btrfs: handle allocation error in update_dev_stat_item
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 14:27:11 +01:00
David Sterba
da353f6b30 btrfs: constify device path passed to relevant helpers
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 14:26:07 +01:00
Linus Torvalds
9003ed1fed Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This has a series of fixes and cleanups that Dave Sterba has been
  collecting.

  There is a pretty big variety here, cleaning up internal APIs and
  fixing corner cases"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (124 commits)
  Btrfs: use the correct type when creating cow dio extent
  Btrfs: fix deadlock between dedup on same file and starting writeback
  btrfs: use btrfs_debug instead of pr_debug in transaction abort
  btrfs: btrfs_truncate_free_space_cache always allocates path
  btrfs: free-space-cache, clean up unnecessary root arguments
  btrfs: convert btrfs_inc_block_group_ro to accept fs_info
  btrfs: flush_space always takes fs_info->fs_root
  btrfs: pass fs_info to (more) routines that are only called with extent_root
  btrfs: qgroup: Move half of the qgroup accounting time out of commit trans
  btrfs: remove unused parameter from adjust_slots_upwards
  btrfs: remove unused parameters from __btrfs_write_out_cache
  btrfs: remove unused parameter from cleanup_write_cache_enospc
  btrfs: remove unused parameter from __add_inode_ref
  btrfs: remove unused parameter from clone_copy_inline_extent
  btrfs: remove unused parameters from btrfs_cmp_data
  btrfs: remove unused parameter from __add_inline_refs
  btrfs: remove unused parameters from scrub_setup_wr_ctx
  btrfs: remove unused parameter from create_snapshot
  btrfs: remove unused parameter from init_first_rw_device
  btrfs: remove unused parameter from __btrfs_alloc_chunk
  ...
2017-02-25 14:53:58 -08:00
David Sterba
e4a4dce72e btrfs: remove unused parameter from init_first_rw_device
The 'device' used to be added in that function, but now it's done by the
caller.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-17 12:03:54 +01:00
David Sterba
72b468c8da btrfs: remove unused parameter from __btrfs_alloc_chunk
We grab fs_info from other parameters.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-17 12:03:53 +01:00
Jan Kara
efa7c9f97e block: Get rid of blk_get_backing_dev_info()
blk_get_backing_dev_info() is now a simple dereference. Remove that
function and simplify some code around that.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:21:32 -07:00
Linus Torvalds
087a76d390 Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "Jeff Mahoney and Dave Sterba have a really nice set of cleanups in
  here, and Christoph pitched in corrections/improvements to make btrfs
  use proper helpers for bio walking instead of doing it by hand.

  There are some key fixes as well, including some long standing bugs
  that took forever to track down in btrfs_drop_extents and during
  balance"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (77 commits)
  btrfs: limit async_work allocation and worker func duration
  Revert "Btrfs: adjust len of writes if following a preallocated extent"
  Btrfs: don't WARN() in btrfs_transaction_abort() for IO errors
  btrfs: opencode chunk locking, remove helpers
  btrfs: remove root parameter from transaction commit/end routines
  btrfs: split btrfs_wait_marked_extents into normal and tree log functions
  btrfs: take an fs_info directly when the root is not used otherwise
  btrfs: simplify btrfs_wait_cache_io prototype
  btrfs: convert extent-tree tracepoints to use fs_info
  btrfs: root->fs_info cleanup, access fs_info->delayed_root directly
  btrfs: root->fs_info cleanup, add fs_info convenience variables
  btrfs: root->fs_info cleanup, update_block_group{,flags}
  btrfs: root->fs_info cleanup, lock/unlock_chunks
  btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
  btrfs: pull node/sector/stripe sizes out of root and into fs_info
  btrfs: root->fs_info cleanup, io_ctl_init
  btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
  btrfs: struct reada_control.root -> reada_control.fs_info
  btrfs: struct btrfsic_state->root should be an fs_info
  btrfs: alloc_reserved_file_extent trace point should use extent_root
  ...
2016-12-16 10:53:01 -08:00
David Sterba
34441361c4 btrfs: opencode chunk locking, remove helpers
The helpers are trivial and we don't use them consistently.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney
3a45bb207e btrfs: remove root parameter from transaction commit/end routines
Now we only use the root parameter to print the root objectid in
a tracepoint.  We can use the root parameter from the transaction
handle for that.  It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney
2ff7e61e0d btrfs: take an fs_info directly when the root is not used otherwise
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer.  Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
0b246afa62 btrfs: root->fs_info cleanup, add fs_info convenience variables
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
3796d33535 btrfs: root->fs_info cleanup, lock/unlock_chunks
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
da17066c40 btrfs: pull node/sector/stripe sizes out of root and into fs_info
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
fb456252d3 btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
5112febbc7 btrfs: btrfs_init_new_device should use fs_info->dev_root
btrfs_init_new_device only uses the root passed in via the ioctl to
start the transaction.  Nothing else that happens is related to whatever
root the user used to initiate the ioctl.  We can drop the root requirement
and just use fs_info->dev_root instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
6bccf3ab1e btrfs: call functions that always use the same root with fs_info instead
There are many functions that are always called with the same root
argument.  Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Jeff Mahoney
5b4aacefb8 btrfs: call functions that overwrite their root parameter with fs_info
There are 11 functions that accept a root parameter and immediately
overwrite it.  We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
David Sterba
b159fa2808 btrfs: remove constant parameter to memset_extent_buffer and rename it
The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:17 +01:00
David Sterba
d24ee97b96 btrfs: use new helpers to set uuids in eb
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:17 +01:00
Christoph Hellwig
cf8cddd38b btrfs: don't abuse REQ_OP_* flags for btrfs_map_block
btrfs_map_block supports different types of mappings, which to a large
extent resemble block layer operations.  But they don't always do, and
currently btrfs dangerously overlays it's own flag over the block layer
flags.  This is just asking for a conflict, so introduce a different
map flags enum inside of btrfs instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-29 14:10:38 +01:00
Christoph Hellwig
70fd76140a block,fs: use REQ_* flags directly
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Christoph Hellwig
67f055c798 btrfs: use op_is_sync to check for synchronous requests
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Chris Mason
19c4d2f994 Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
This reverts commit 5d8eb6fe51.

When we remove devices, we free the device structures.  Delaying
btfs_remove_chunk() ends up hitting a use-after-free on them.

Signed-off-by: Chris Mason <clm@fb.com>
2016-10-10 13:43:31 -07:00
Anand Jain
0ccd05285e btrfs: fix a possible umount deadlock
btrfs_show_devname() is using the device_list_mutex, sometimes
a call to blkdev_put() leads vfs calling into this func. So
call blkdev_put() outside of device_list_mutex, as of now.

[  983.284212] ======================================================
[  983.290401] [ INFO: possible circular locking dependency detected ]
[  983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted
[  983.302081] -------------------------------------------------------
[  983.308357] umount/21720 is trying to acquire lock:
[  983.313243]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff9128ec51>] blkdev_put+0x31/0x150
[  983.321264]
[  983.321264] but task is already holding lock:
[  983.327101]  (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs]
[  983.337839]
[  983.337839] which lock already depends on the new lock.
[  983.337839]
[  983.346024]
[  983.346024] the existing dependency chain (in reverse order) is:
[  983.353512]
-> #4 (&fs_devs->device_list_mutex){+.+...}:
[  983.359096]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
[  983.365143]        [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
[  983.371521]        [<ffffffffc02d8116>] btrfs_show_devname+0x36/0x1f0 [btrfs]
[  983.378710]        [<ffffffff9129523e>] show_vfsmnt+0x4e/0x150
[  983.384593]        [<ffffffff9126ffc7>] m_show+0x17/0x20
[  983.389957]        [<ffffffff91276405>] seq_read+0x2b5/0x3b0
[  983.395669]        [<ffffffff9124c808>] __vfs_read+0x28/0x100
[  983.401464]        [<ffffffff9124eb3b>] vfs_read+0xab/0x150
[  983.407080]        [<ffffffff9124ec32>] SyS_read+0x52/0xb0
[  983.412609]        [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.419617]
-> #3 (namespace_sem){++++++}:
[  983.424024]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
[  983.430074]        [<ffffffff918239e9>] down_write+0x49/0x80
[  983.435785]        [<ffffffff91272457>] lock_mount+0x67/0x1c0
[  983.441582]        [<ffffffff91272ab2>] do_add_mount+0x32/0xf0
[  983.447458]        [<ffffffff9127363a>] finish_automount+0x5a/0xc0
[  983.453682]        [<ffffffff91259513>] follow_managed+0x1b3/0x2a0
[  983.459912]        [<ffffffff9125b750>] lookup_fast+0x300/0x350
[  983.465875]        [<ffffffff9125d6e7>] path_openat+0x3a7/0xaa0
[  983.471846]        [<ffffffff9125ef75>] do_filp_open+0x85/0xe0
[  983.477731]        [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0
[  983.483702]        [<ffffffff9124c4de>] SyS_open+0x1e/0x20
[  983.489240]        [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.496254]
-> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}:
[  983.501798]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
[  983.507855]        [<ffffffff918239e9>] down_write+0x49/0x80
[  983.513558]        [<ffffffff91366237>] start_creating+0x87/0x100
[  983.519703]        [<ffffffff91366647>] debugfs_create_dir+0x17/0x100
[  983.526195]        [<ffffffff911df153>] bdi_register+0x93/0x210
[  983.532165]        [<ffffffff911df313>] bdi_register_owner+0x43/0x70
[  983.538570]        [<ffffffff914080fb>] device_add_disk+0x1fb/0x450
[  983.544888]        [<ffffffff91580226>] loop_add+0x1e6/0x290
[  983.550596]        [<ffffffff91fec358>] loop_init+0x10b/0x14f
[  983.556394]        [<ffffffff91002207>] do_one_initcall+0xa7/0x180
[  983.562618]        [<ffffffff91f932e0>] kernel_init_freeable+0x1cc/0x266
[  983.569370]        [<ffffffff918174be>] kernel_init+0xe/0x100
[  983.575166]        [<ffffffff9182620f>] ret_from_fork+0x1f/0x40
[  983.581131]
-> #1 (loop_index_mutex){+.+.+.}:
[  983.585801]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
[  983.591858]        [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
[  983.598256]        [<ffffffff9157ed3f>] lo_open+0x1f/0x60
[  983.603704]        [<ffffffff9128eec3>] __blkdev_get+0x123/0x400
[  983.609757]        [<ffffffff9128f4ea>] blkdev_get+0x34a/0x350
[  983.615639]        [<ffffffff9128f554>] blkdev_open+0x64/0x80
[  983.621428]        [<ffffffff9124aff6>] do_dentry_open+0x1c6/0x2d0
[  983.627651]        [<ffffffff9124c029>] vfs_open+0x69/0x80
[  983.633181]        [<ffffffff9125db74>] path_openat+0x834/0xaa0
[  983.639152]        [<ffffffff9125ef75>] do_filp_open+0x85/0xe0
[  983.645035]        [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0
[  983.650999]        [<ffffffff9124c4de>] SyS_open+0x1e/0x20
[  983.656535]        [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  983.663541]
-> #0 (&bdev->bd_mutex){+.+.+.}:
[  983.668107]        [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0
[  983.674510]        [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
[  983.680561]        [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
[  983.686967]        [<ffffffff9128ec51>] blkdev_put+0x31/0x150
[  983.692761]        [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs]
[  983.699699]        [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs]
[  983.707178]        [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs]
[  983.714380]        [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs]
[  983.721061]        [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs]
[  983.727908]        [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100
[  983.734744]        [<ffffffff91250f56>] kill_anon_super+0x16/0x30
[  983.740888]        [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs]
[  983.747909]        [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80
[  983.754745]        [<ffffffff912515fd>] deactivate_super+0x5d/0x70
[  983.760977]        [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80
[  983.766773]        [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20
[  983.772738]        [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0
[  983.778708]        [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4
[  983.785373]        [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0
[  983.792212]        [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1
[  983.799225]
[  983.799225] other info that might help us debug this:
[  983.799225]
[  983.807291] Chain exists of:
  &bdev->bd_mutex --> namespace_sem --> &fs_devs->device_list_mutex

[  983.816521]  Possible unsafe locking scenario:
[  983.816521]
[  983.822489]        CPU0                    CPU1
[  983.827043]        ----                    ----
[  983.831599]   lock(&fs_devs->device_list_mutex);
[  983.836289]                                lock(namespace_sem);
[  983.842268]                                lock(&fs_devs->device_list_mutex);
[  983.849478]   lock(&bdev->bd_mutex);
[  983.853127]
[  983.853127]  *** DEADLOCK ***
[  983.853127]
[  983.859113] 3 locks held by umount/21720:
[  983.863145]  #0:  (&type->s_umount_key#35){++++..}, at: [<ffffffff912515f5>] deactivate_super+0x55/0x70
[  983.872713]  #1:  (uuid_mutex){+.+.+.}, at: [<ffffffffc033d8d3>] btrfs_close_devices+0x23/0xa0 [btrfs]
[  983.882206]  #2:  (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs]
[  983.893422]
[  983.893422] stack backtrace:
[  983.897824] CPU: 6 PID: 21720 Comm: umount Not tainted 4.8.0-rc5-ceph-00023-g1b39cec2 #1
[  983.905958] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
[  983.913492]  0000000000000000 ffff8c8a53c17a38 ffffffff91429521 ffffffff9260f4f0
[  983.921018]  ffffffff92642760 ffff8c8a53c17a88 ffffffff911b2b04 0000000000000050
[  983.928542]  ffffffff9237d620 ffff8c8a5294aee0 ffff8c8a5294aeb8 ffff8c8a5294aee0
[  983.936072] Call Trace:
[  983.938545]  [<ffffffff91429521>] dump_stack+0x85/0xc4
[  983.943715]  [<ffffffff911b2b04>] print_circular_bug+0x1fb/0x20c
[  983.949748]  [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0
[  983.955613]  [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0
[  983.961123]  [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150
[  983.966550]  [<ffffffff91823125>] mutex_lock_nested+0x65/0x350
[  983.972407]  [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150
[  983.977832]  [<ffffffff9128ec51>] blkdev_put+0x31/0x150
[  983.983101]  [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs]
[  983.989500]  [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs]
[  983.996415]  [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs]
[  984.003068]  [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs]
[  984.009189]  [<ffffffff9126cc5e>] ? evict_inodes+0x15e/0x170
[  984.014881]  [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs]
[  984.021176]  [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100
[  984.027476]  [<ffffffff91250f56>] kill_anon_super+0x16/0x30
[  984.033082]  [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs]
[  984.039548]  [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80
[  984.045839]  [<ffffffff912515fd>] deactivate_super+0x5d/0x70
[  984.051525]  [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80
[  984.056774]  [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20
[  984.062201]  [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0
[  984.067625]  [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4
[  984.073747]  [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0
[  984.080038]  [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1

Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 19:37:06 +02:00
Jeff Mahoney
ab8d0fc48d btrfs: convert pr_* to btrfs_* where possible
For many printks, we want to know which file system issued the message.

This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.

fs/btrfs/check-integrity.c is left alone for another day.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 19:37:04 +02:00
Jeff Mahoney
62e855771d btrfs: convert printk(KERN_* to use pr_* calls
This patch converts printk(KERN_* style messages to use the pr_* versions.

One side effect is that anything that was KERN_DEBUG is now automatically
a dynamic debug message.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:08:44 +02:00
Jeff Mahoney
5d163e0e68 btrfs: unsplit printed strings
CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."

This patch unsplits user-visible strings.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:08:44 +02:00
Jeff Mahoney
cea67ab92d btrfs: clean the old superblocks before freeing the device
btrfs_rm_device frees the block device but then re-opens it using
the saved device name.  A race exists between the close and the
re-open that allows the block size to be changed.  The result
is getting stuck forever in the reclaim loop in __getblk_slow.

This patch moves the superblock cleanup before closing the block
device, which is also consistent with other callers.  We also don't
need a private copy of dev_name as the whole routine operates under
the uuid_mutex.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:08:44 +02:00
Josef Bacik
afcdd129e0 Btrfs: add a flags field to btrfs_fs_info
We have a lot of random ints in btrfs_fs_info that can be put into flags.  This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 17:59:49 +02:00
Naohiro Aota
5d8eb6fe51 btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs
Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But
the work can be done by btrfs_delete_unused_bgs() (and it's better since it
trim the BG). Let's dedupe the code.

While btrfs_delete_unused_bgs() is already hitting the relocated BG, it
skip the BG since the BG has "ro" flag set (to keep balancing BG intact).
On the other hand, btrfs cannot drop "ro" flag here to prevent additional
writes. So this patch make use of "removed" flag.
btrfs_delete_unused_bgs() now detect the flag to distinguish whether a
read-only BG is relocating or not.

Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 17:59:49 +02:00
Linus Torvalds
28687b935e Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We've queued up a few different fixes in here.  These range from
  enospc corners to fsync and quota fixes, and a few targeted at error
  handling for corrupt metadata/fuzzing"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix lockdep warning on deadlock against an inode's log mutex
  Btrfs: detect corruption when non-root leaf has zero item
  Btrfs: check btree node's nritems
  btrfs: don't create or leak aliased root while cleaning up orphans
  Btrfs: fix em leak in find_first_block_group
  btrfs: do not background blkdev_put()
  Btrfs: clarify do_chunk_alloc()'s return value
  btrfs: fix fsfreeze hang caused by delayed iputs deal
  btrfs: update btrfs_space_info's bytes_may_use timely
  btrfs: divide btrfs_update_reserved_bytes() into two functions
  btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
  btrfs: qgroup: Fix qgroup incorrectness caused by log replay
  btrfs: relocation: Fix leaking qgroups numbers on data extents
  btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent()
  btrfs: waiting on qgroup rescan should not always be interruptible
  btrfs: properly track when rescan worker is running
  btrfs: flush_space: treat return value of do_chunk_alloc properly
  Btrfs: add ASSERT for block group's memory leak
  btrfs: backref: Fix soft lockup in __merge_refs function
  Btrfs: fix memory leak of reloc_root
2016-08-26 20:22:01 -07:00
Anand Jain
1423881941 btrfs: do not background blkdev_put()
At the end of unmount/dev-delete, if the device exclusive open is not
actually closed, then there might be a race with another program in
the userland who is trying to open the device in exclusive mode and
it may fail for eg:
      unmount /btrfs; fsck /dev/x
      btrfs dev del /dev/x /btrfs; fsck /dev/x
so here background blkdev_put() is not a choice

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:28 -07:00
Jens Axboe
1eff9d322a block: rename bio bi_rw to bi_opf
Since commit 63a4cc2486, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07 14:41:02 -06:00
Linus Torvalds
d58b0d980f Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull more btrfs updates from Chris Mason:
 "This is part two of my btrfs pull, which is some cleanups and a batch
  of fixes.

  Most of the code here is from Jeff Mahoney, making the pointers we
  pass around internally more consistent and less confusing overall.  I
  noticed a small problem right before I sent this out yesterday, so I
  fixed it up and re-tested overnight"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits)
  Btrfs: fix __MAX_CSUM_ITEMS
  btrfs: btrfs_abort_transaction, drop root parameter
  btrfs: add btrfs_trans_handle->fs_info pointer
  btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
  btrfs: convert nodesize macros to static inlines
  btrfs: introduce BTRFS_MAX_ITEM_SIZE
  btrfs: cleanup, remove prototype for btrfs_find_root_ref
  btrfs: copy_to_sk drop unused root parameter
  btrfs: simpilify btrfs_subvol_inherit_props
  btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root
  btrfs: tests, require fs_info for root
  btrfs: tests, move initialization into tests/
  btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
  btrfs: prefix fsid to all trace events
  btrfs: plumb fs_info into btrfs_work
  btrfs: remove obsolete part of comment in statfs
  btrfs: hide test-only member under ifdef
  btrfs: Ratelimit "no csum found" info message
  btrfs: Add ratelimit to btrfs printing
  Btrfs: fix unexpected balance crash due to BUG_ON
  ...
2016-08-04 19:56:16 -04:00
Linus Torvalds
d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Jeff Mahoney
66642832f0 btrfs: btrfs_abort_transaction, drop root parameter
__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer.  We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:26 +02:00
Jeff Mahoney
05f9a78012 btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
In btrfs_relocate_chunk, we get a transaction handle via
btrfs_start_trans_remove_block_group, which starts the transaction
using the extent root.  When we call btrfs_end_transaction, we're calling
it using the chunk root.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:25 +02:00
Jeff Mahoney
14a1e067b4 btrfs: introduce BTRFS_MAX_ITEM_SIZE
We use BTRFS_LEAF_DATA_SIZE - sizeof(struct btrfs_item) in
several places.  This introduces a BTRFS_MAX_ITEM_SIZE macro to do the
same.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:24 +02:00
Jeff Mahoney
3cdde2240d btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
btrfs_test_opt and friends only use the root pointer to access
the fs_info.  Let's pass the fs_info directly in preparation to
eliminate similar patterns all over btrfs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:53:16 +02:00
Liu Bo
5a488b9d2c Btrfs: fix unexpected balance crash due to BUG_ON
Mounting a btrfs can resume previous balance operations asynchronously.
An user got a crash when one drive has some corrupt sectors.

Since balance can cancel itself in case of any error, we can gracefully
return errors to upper layers and let balance do the cancel job.

Reported-by: sash <master.b.at.raven@chefmail.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:52:25 +02:00
Anand Jain
e2bf6e89b4 btrfs: make sure device is synced before return
An inconsistent behavior due to stale reads from the
disk was reported

  mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html

This patch will make sure devices are synced before
return in the unmount thread.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:52:25 +02:00
Anand Jain
f448341af9 btrfs: reorg btrfs_close_one_device()
Moves closer to the caller and removes declaration

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:52:25 +02:00
Vincent Stehlé
df5c82a8dc Btrfs: fix comparison in __btrfs_map_block()
Add missing comparison to op in expression, which was forgotten when doing
the REQ_OP transition.

Fixes: b3d3fa5199 ("btrfs: update __btrfs_map_block for REQ_OP transition")
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-18 15:28:23 -06:00
Linus Torvalds
b971712afc Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I have a two part pull this time because one of the patches Dave
  Sterba collected needed to be against v4.7-rc2 or higher (we used
  rc4).  I try to make my for-linus-xx branch testable on top of the
  last major so we can hand fixes to people on the list more easily, so
  I've split this pull in two.

  This first part has some fixes and two performance improvements that
  we've been testing for some time.

  Josef's two performance fixes are most notable.  The transid tracking
  patch makes a big improvement on pretty much every workload"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: Force stripesize to the value of sectorsize
  btrfs: fix disk_i_size update bug when fallocate() fails
  Btrfs: fix error handling in map_private_extent_buffer
  Btrfs: fix error return code in btrfs_init_test_fs()
  Btrfs: don't do nocow check unless we have to
  btrfs: fix deadlock in delayed_ref_async_start
  Btrfs: track transid for delayed ref flushing
2016-06-25 08:42:31 -07:00
Chandan Rajendra
b7f67055d2 Btrfs: Force stripesize to the value of sectorsize
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.

This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:42 -07:00
Linus Torvalds
4c6459f945 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The most user visible change here is a fix for our recent superblock
  validation checks that were causing problems on non-4k pagesized
  systems"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize
  btrfs: remove build fixup for qgroup_account_snapshot
  btrfs: use new error message helper in qgroup_account_snapshot
  btrfs: avoid blocking open_ctree from cleaner_kthread
  Btrfs: don't BUG_ON() in btrfs_orphan_add
  btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
  Btrfs: check if extent buffer is aligned to sectorsize
  btrfs: Use correct format specifier
2016-06-18 05:57:59 -10:00
Liu Bo
c871b0f2fd Btrfs: check if extent buffer is aligned to sectorsize
Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
via alloc_extent_buffer().  An unaligned eb can have more pages than it
should have, which ends up extent buffer's leak or some corrupted content
in extent buffer.

This adds a warning to let us quickly know what was happening.

Now that alloc_extent_buffer() no more returns NULL, this changes its
caller and callers of its caller to match with the new error
handling.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Linus Torvalds
3d0f0b6a55 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Has some fixes and some new self tests for btrfs.  The self tests are
  usually disabled in the .config file (unless you're doing btrfs dev
  work), and this bunch is meant to find problems with the 64K page size
  patches.

  Jeff has a patch to help people see if they are using the hardware
  assist crc32c module, which really helps us nail down problems when
  people ask why crcs are using so much CPU.

  Otherwise, it's small fixes"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: self-tests: Fix extent buffer bitmap test fail on BE system
  Btrfs: self-tests: Fix test_bitmaps fail on 64k sectorsize
  Btrfs: self-tests: Use macros instead of constants and add missing newline
  Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes
  Btrfs: self-tests: Execute page straddling test only when nodesize < PAGE_SIZE
  btrfs: advertise which crc32c implementation is being used at module load
  Btrfs: add validadtion checks for chunk loading
  Btrfs: add more validation checks for superblock
  Btrfs: clear uptodate flags of pages in sys_array eb
  Btrfs: self-tests: Support non-4k page size
  Btrfs: Fix integer overflow when calculating bytes_per_bitmap
  Btrfs: test_check_exists: Fix infinite loop when searching for free space entries
  Btrfs: end transaction if we abort when creating uuid root
  btrfs: Use __u64 in exported linux/btrfs.h.
2016-06-10 14:13:27 -07:00
Chris Mason
719da39a61 Merge branch 'misc-fixes-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7 2016-06-08 14:36:12 -07:00
Mike Christie
81a75f6781 btrfs: use bio fields for op and flags
The bio REQ_OP and bi_rw rq_flag_bits are now always setup, so there is
no need to pass around the rq_flag_bits bits too. btrfs users should
should access the bio insead.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie
b3d3fa5199 btrfs: update __btrfs_map_block for REQ_OP transition
We no longer pass in a bitmap of rq_flag_bits bits to __btrfs_map_block.
It will always be a REQ_OP, or the btrfs specific REQ_GET_READ_MIRRORS,
so this drops the bit tests.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie
37226b2111 btrfs: use bio op accessors
This should be the easier cases to convert btrfs to
bio_set_op_attrs/bio_op.
They are mostly just cut and replace type of changes.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie
4e49ea4a3d block/fs/drivers: remove rw argument from submit_bio
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie <mchristi@redhat.com>

Fixed up fs/ext4/crypto.c

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Liu Bo
e06cd3dd7c Btrfs: add validadtion checks for chunk loading
To prevent fuzzed filesystem images from panic the whole system,
we need various validation checks to refuse to mount such an image
if btrfs finds any invalid value during loading chunks, including
both sys_array and regular chunks.

Note that these checks may not be sufficient to cover all corner cases,
feel free to add more checks.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06 10:57:09 +02:00
Liu Bo
99e3ecfcb9 Btrfs: add more validation checks for superblock
This adds validation checks for super_total_bytes, super_bytes_used and
super_stripesize, super_num_devices.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06 10:41:53 +02:00
Liu Bo
d865177a5e Btrfs: clear uptodate flags of pages in sys_array eb
We set uptodate flag to pages in the temporary sys_array eb,
but do not clear the flag after free eb.  As the special
btree inode may still hold a reference on those pages, the
uptodate flag can remain alive in them.

If btrfs_super_chunk_root has been intentionally changed to the
offset of this sys_array eb, reading chunk_root will read content
of sys_array and it will skip our beautiful checks in
btree_readpage_end_io_hook() because of
"pages of eb are uptodate => eb is uptodate"

This adds the 'clear uptodate' part to force it to read from disk.

Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-06 10:14:40 +02:00
Linus Torvalds
b2d5ad8223 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The important part of this pull is Filipe's set of fixes for btrfs
  device replacement.  Filipe fixed a few issues seen on the list and a
  number he found on his own"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent
  Btrfs: fix race between device replace and read repair
  Btrfs: fix race between device replace and discard
  Btrfs: fix race between device replace and chunk allocation
  Btrfs: fix race setting block group back to RW mode during device replace
  Btrfs: fix unprotected assignment of the left cursor for device replace
  Btrfs: fix race setting block group readonly during device replace
  Btrfs: fix race between device replace and block group removal
  Btrfs: fix race between readahead and device replace/removal
2016-06-04 11:56:28 -07:00
Josef Bacik
65d4f4c151 Btrfs: end transaction if we abort when creating uuid root
We still need to call btrfs_end_transaction if we call btrfs_abort_transaction,
otherwise we hang and make me super grumpy.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-01 00:32:42 +02:00
Filipe Manana
22ab04e814 Btrfs: fix race between device replace and chunk allocation
While iterating and copying extents from the source device, the device
replace code keeps adjusting a left cursor that is used to make sure that
once we finish processing a device extent, any future writes to extents
from the corresponding block group will get into both the source and
target devices. This left cursor is also used for resuming the device
replace operation at mount time.

However using this left cursor to decide whether writes go into both
devices or only the source device is not enough to guarantee we don't
miss copying extents into the target device. There are two cases where
the current approach fails. The first one is related to when there are
holes in the device and they get allocated for new block groups while
the device replace operation is iterating the device extents (more on
this explained below). The second one is that when that loop over the
device extents finishes, we start dellaloc, wait for all ordered extents
and then commit the current transaction, we might have got new block
groups allocated that are now using a device extent that has an offset
greater then or equals to the value of the left cursor, in which case
writes to extents belonging to these new block groups will get issued
only to the source device.

For the first case where the current approach of using a left cursor
fails, consider the source device currently has the following layout:

  [ extent bg A ] [ hole, unallocated space ] [extent bg B ]
  3Gb             4Gb                         5Gb

While we are iterating the device extents from the source device using
the commit root of the device tree, the following happens:

        CPU 1                                            CPU 2

                      <we are at transaction N>

  scrub_enumerate_chunks()
    --> searches the device tree for
        extents belonging to the source
        device using the device tree's
        commit root
    --> 1st iteration finds extent belonging to
        block group A

        --> sets block group A to RO mode
            (btrfs_inc_block_group_ro)

        --> sets cursor left to found_key.offset
            which is 3Gb

        --> scrub_chunk() starts
            copies all allocated extents from
            block group's A stripe at source
            device into target device

                                                           btrfs_alloc_chunk()
                                                             --> allocates device extent
                                                                 in the range [4Gb, 5Gb[
                                                                 from the source device for
                                                                 a new block group C

                                                           extent allocated from block
                                                           group C for a direct IO,
                                                           buffered write or btree node/leaf

                                                           extent is written to, perhaps
                                                           in response to a writepages()
                                                           call from the VM or directly
                                                           through direct IO

                                                           the write is made only against
                                                           the source device and not against
                                                           the target device because the
                                                           extent's offset is in the interval
                                                           [4Gb, 5Gb[ which is larger then
                                                           the value of cursor_left (3Gb)

        --> scrub_chunks() finishes

        --> updates left cursor from 3Gb to
            4Gb

        --> btrfs_dec_block_group_ro() sets
            block group A back to RW mode

                             <we are still at transaction N>

    --> 2nd iteration finds extent belonging to
        block group B - it did not find the new
        extent in the range [4Gb, 5Gb[ for block
        group C because we are using the device
        tree's commit root or even because the
        block group's items are not all yet
        inserted in the respective btrees, that is,
        the block group is still attached to some
        transaction handle's new_bgs list and
        btrfs_create_pending_block_groups() was
        not called yet against that transaction
        handle, so the device extent items were
        not yet inserted into the devices tree

                             <we are still at transaction N>

        --> so we end not copying anything from the newly
            allocated device extent from the source device
            to the target device

So fix this by making __btrfs_map_block() always redirect writes to the
target device as well, independently of the left cursor's value. With
this change the left cursor is now used only for the purpose of tracking
progress and allow a mount operation to resume a device replace.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2016-05-30 12:58:26 +01:00
Filipe Manana
57ba4cb85b Btrfs: fix race between device replace and block group removal
When it's finishing, the device replace code iterates all extent maps
representing block group and for each one that has a stripe that refers
to the source device, it replaces its device with the target device.
However when it replaces the source device with the target device it,
the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
only after its ID is changed to match the one from the source device.
This leads to races with the chunk removal code that can temporarly see
a device with an ID of 0ULL and then attempt to use that ID to remove
items from the device tree and fail, causing a transaction abort:

[ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) to /dev/sde finished
[ 9238.594377] ------------[ cut here ]------------
[ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 btrfs_remove_chunk+0x2e5/0x793 [btrfs]
[ 9238.594403] BTRFS: Transaction aborted (error 1)
[ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 evdev sg i2c_core se
rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod fl
oppy [last unloaded: btrfs]
[ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 4.6.0-rc7-btrfs-next-29+ #1
[ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[ 9238.594421]  0000000000000000 ffff88017f1dbc60 ffffffff8126b42c ffff88017f1dbcb0
[ 9238.594422]  0000000000000000 ffff88017f1dbca0 ffffffff81052b14 00000ad37f1dbd18
[ 9238.594423]  0000000000000001 ffff88018068a558 ffff88005c4b9c00 ffff880233f60db0
[ 9238.594424] Call Trace:
[ 9238.594428]  [<ffffffff8126b42c>] dump_stack+0x67/0x90
[ 9238.594430]  [<ffffffff81052b14>] __warn+0xc2/0xdd
[ 9238.594432]  [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
[ 9238.594434]  [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
[ 9238.594450]  [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
[ 9238.594452]  [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
[ 9238.594464]  [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 [btrfs]
[ 9238.594476]  [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
[ 9238.594489]  [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
[ 9238.594490]  [<ffffffff8106f403>] kthread+0xd4/0xdc
[ 9238.594494]  [<ffffffff8149e242>] ret_from_fork+0x22/0x40
[ 9238.594495]  [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
[ 9238.594496] ---[ end trace 183efbe50275f059 ]---

The sequence of steps leading to this is like the following:

              CPU 1                                           CPU 2

 btrfs_dev_replace_finishing()

   at this point
   dev_replace->tgtdev->devid ==
   BTRFS_DEV_REPLACE_DEVID (0ULL)

   ...

   btrfs_start_transaction()
   btrfs_commit_transaction()

                                                     btrfs_delete_unused_bgs()
                                                       btrfs_remove_chunk()

                                                         looks up for the extent map
                                                         corresponding to the chunk

                                                         lock_chunks() (chunk_mutex)
                                                         check_system_chunk()
                                                         unlock_chunks() (chunk_mutex)

   locks fs_info->chunk_mutex

   btrfs_dev_replace_update_device_in_mapping_tree()
     --> iterates fs_info->mapping_tree and
         replaces the device in every extent
         map's map->stripes[] with
         dev_replace->tgtdev, which still has
         an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)

                                                         iterates over all stripes from
                                                         the extent map

                                                           --> calls btrfs_free_dev_extent()
                                                               passing it the target device
                                                               that still has an ID of 0ULL

                                                           --> btrfs_free_dev_extent() fails
                                                             --> aborts current transaction

   finishes setting up the target device,
   namely it sets tgtdev->devid to the value
   of srcdev->devid (which is necessarily > 0)

   frees the srcdev

   unlocks fs_info->chunk_mutex

So fix this by taking the device list mutex while processing the stripes
for the chunk's extent map. This is similar to the race between device
replace and block group creation that was fixed by commit 50460e3718
("Btrfs: fix race when finishing dev replace leading to transaction abort").

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2016-05-30 12:58:19 +01:00
Linus Torvalds
559b6d90a0 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs cleanups and fixes from Chris Mason:
 "We have another round of fixes and a few cleanups.

  I have a fix for short returns from btrfs_copy_from_user, which
  finally nails down a very hard to find regression we added in v4.6.

  Dave is pushing around gfp parameters, mostly to cleanup internal apis
  and make it a little more consistent.

  The rest are smaller fixes, and one speelling fixup patch"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (22 commits)
  Btrfs: fix handling of faults from btrfs_copy_from_user
  btrfs: fix string and comment grammatical issues and typos
  btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
  Btrfs: fix unexpected return value of fiemap
  Btrfs: free sys_array eb as soon as possible
  btrfs: sink gfp parameter to convert_extent_bit
  btrfs: make state preallocation more speculative in __set_extent_bit
  btrfs: untangle gotos a bit in convert_extent_bit
  btrfs: untangle gotos a bit in __clear_extent_bit
  btrfs: untangle gotos a bit in __set_extent_bit
  btrfs: sink gfp parameter to set_record_extent_bits
  btrfs: sink gfp parameter to set_extent_new
  btrfs: sink gfp parameter to set_extent_defrag
  btrfs: sink gfp parameter to set_extent_delalloc
  btrfs: sink gfp parameter to clear_extent_dirty
  btrfs: sink gfp parameter to clear_record_extent_bits
  btrfs: sink gfp parameter to clear_extent_bits
  btrfs: sink gfp parameter to set_extent_bits
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-27 16:37:36 -07:00
David Sterba
42f31734eb Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 2016-05-25 22:51:03 +02:00
Nicholas D Steeves
0132761017 btrfs: fix string and comment grammatical issues and typos
Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 22:35:14 +02:00
Liu Bo
1c8b5b6e8b Btrfs: free sys_array eb as soon as possible
While reading sys_chunk_array in superblock, btrfs creates a temporary
extent buffer.  Since we don't use it after finishing reading
 sys_chunk_array, we don't need to keep it in memory.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 19:53:51 +02:00
Linus Torvalds
07be1337b9 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This has our merge window series of cleanups and fixes.  These target
  a wide range of issues, but do include some important fixes for
  qgroups, O_DIRECT, and fsync handling.  Jeff Mahoney moved around a
  few definitions to make them easier for userland to consume.

  Also whiteout support is included now that issues with overlayfs have
  been cleared up.

  I have one more fix pending for page faults during btrfs_copy_from_user,
  but I wanted to get this bulk out the door first"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (90 commits)
  btrfs: fix memory leak during RAID 5/6 device replacement
  Btrfs: add semaphore to synchronize direct IO writes with fsync
  Btrfs: fix race between block group relocation and nocow writes
  Btrfs: fix race between fsync and direct IO writes for prealloc extents
  Btrfs: fix number of transaction units for renames with whiteout
  Btrfs: pin logs earlier when doing a rename exchange operation
  Btrfs: unpin logs if rename exchange operation fails
  Btrfs: fix inode leak on failure to setup whiteout inode in rename
  btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT
  Btrfs: pin log earlier when renaming
  Btrfs: unpin log if rename operation fails
  Btrfs: don't do unnecessary delalloc flushes when relocating
  Btrfs: don't wait for unrelated IO to finish before relocation
  Btrfs: fix empty symlink after creating symlink and fsync parent dir
  Btrfs: fix for incorrect directory entries after fsync log replay
  btrfs: build fixup for qgroup_account_snapshot
  btrfs: qgroup: Fix qgroup accounting when creating snapshot
  Btrfs: fix fspath error deallocation
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-21 10:49:22 -07:00
Andy Shevchenko
8da4b8c48e lib/uuid.c: move generate_random_uuid() to uuid.c
Let's gather the UUID related functions under one hood.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
David Sterba
36fac9e9ff Merge branch 'foreign/anand/dev-del-by-id-ext' into for-chris-4.7-20160516 2016-05-16 15:46:26 +02:00
David Sterba
5ef64a3e75 Merge branch 'cleanups-4.7' into for-chris-4.7-20160516 2016-05-16 15:46:24 +02:00
Anand Jain
48b3b9d401 btrfs: fix lock dep warning move scratch super outside of chunk_mutex
Move scratch super outside of the chunk lock to avoid below
lockdep warning. The better place to scratch super is in
the function btrfs_rm_dev_replace_free_srcdev() just before
free_device, which is outside of the chunk lock as well.

To reproduce:
  (fresh boot)
  mkfs.btrfs -f -draid5 -mraid5 /dev/sdc /dev/sdd /dev/sde
  mount /dev/sdc /btrfs
  dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
  (get devmgt from https://github.com/asj/devmgt.git)
  devmgt detach /dev/sde
  dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
  sync
  btrfs replace start -Brf 3 /dev/sdf /btrfs <--
  devmgt attach host7

======================================================
[ INFO: possible circular locking dependency detected ]
4.6.0-rc2asj+ #1 Not tainted
---------------------------------------------------

btrfs/2174 is trying to acquire lock:
(sb_writers){.+.+.+}, at:
[<ffffffff812449b4>] __sb_start_write+0xb4/0xf0

but task is already holding lock:
(&fs_info->chunk_mutex){+.+.+.}, at:
[<ffffffffa05c5f55>] btrfs_dev_replace_finishing+0x145/0x980 [btrfs]

which lock already depends on the new lock.

Chain exists of:
sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex
Possible unsafe locking scenario:
CPU0				CPU1
----				----
lock(&fs_info->chunk_mutex);
				lock(&fs_devs->device_list_mutex);
				lock(&fs_info->chunk_mutex);
lock(sb_writers);

*** DEADLOCK ***

-> #0 (sb_writers){.+.+.+}:
[<ffffffff810e6415>] __lock_acquire+0x1bc5/0x1ee0
[<ffffffff810e707e>] lock_acquire+0xbe/0x210
[<ffffffff810df49a>] percpu_down_read+0x4a/0xa0
[<ffffffff812449b4>] __sb_start_write+0xb4/0xf0
[<ffffffff81265534>] mnt_want_write+0x24/0x50
[<ffffffff812508a2>] path_openat+0x952/0x1190
[<ffffffff81252451>] do_filp_open+0x91/0x100
[<ffffffff8123f5cc>] file_open_name+0xfc/0x140
[<ffffffff8123f643>] filp_open+0x33/0x60
[<ffffffffa0572bb6>] update_dev_time+0x16/0x40 [btrfs]
[<ffffffffa057f60d>] btrfs_scratch_superblocks+0x5d/0xb0 [btrfs]
[<ffffffffa057f70e>] btrfs_rm_dev_replace_remove_srcdev+0xae/0xd0 [btrfs]
[<ffffffffa05c62c5>] btrfs_dev_replace_finishing+0x4b5/0x980 [btrfs]
[<ffffffffa05c6ae8>] btrfs_dev_replace_start+0x358/0x530 [btrfs]

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06 15:22:49 +02:00
Josef Bacik
e042d1ec44 Btrfs: remove BUG_ON()'s in btrfs_map_block
btrfs_map_block can go horribly wrong in the face of fs corruption, lets agree
to not be assholes and panic at any possible chance things are all fucked up.

Signed-off-by: Josef Bacik <jbacik@fb.com>
[ removed type casts ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06 15:22:49 +02:00
Liu Bo
3d8da67817 Btrfs: fix divide error upon chunk's stripe_len
The struct 'map_lookup' uses type int for @stripe_len, while
btrfs_chunk_stripe_len() can return a u64 value, and it may end up with
@stripe_len being undefined value and it can lead to 'divide error' in
 __btrfs_map_block().

This changes 'map_lookup' to use type u64 for stripe_len, also right now
we only use BTRFS_STRIPE_LEN for stripe_len, so this adds a valid checker for
BTRFS_STRIPE_LEN.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ folded division fix to scrub_raid56_parity ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06 15:22:49 +02:00
Anand Jain
779bf3fefa btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex
When the replace target fails, the target device will be taken
out of fs device list, scratch + update_dev_time and freed. However
we could do the scratch  + update_dev_time and free part after the
device has been taken out of device list, so that we don't have to
hold the device_list_mutex and uuid_mutex locks.

Reported issue:

[ 5375.718845] ======================================================
[ 5375.718846] [ INFO: possible circular locking dependency detected ]
[ 5375.718849] 4.4.5-scst31x-debug-11+ #40 Not tainted
[ 5375.718849] -------------------------------------------------------
[ 5375.718851] btrfs-health/4662 is trying to acquire lock:
[ 5375.718861]  (sb_writers){.+.+.+}, at: [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
[ 5375.718862]
[ 5375.718862] but task is already holding lock:
[ 5375.718907]  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
[ 5375.718907]
[ 5375.718907] which lock already depends on the new lock.
[ 5375.718907]
[ 5375.718908]
[ 5375.718908] the existing dependency chain (in reverse order) is:
[ 5375.718911]
[ 5375.718911] -> #3 (&fs_devs->device_list_mutex){+.+.+.}:
[ 5375.718917]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.718921]        [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
[ 5375.718940]        [<ffffffffa0219bf6>] btrfs_show_devname+0x36/0x210 [btrfs]
[ 5375.718945]        [<ffffffff81267079>] show_vfsmnt+0x49/0x150
[ 5375.718948]        [<ffffffff81240b07>] m_show+0x17/0x20
[ 5375.718951]        [<ffffffff81246868>] seq_read+0x2d8/0x3b0
[ 5375.718955]        [<ffffffff8121df28>] __vfs_read+0x28/0xd0
[ 5375.718959]        [<ffffffff8121e806>] vfs_read+0x86/0x130
[ 5375.718962]        [<ffffffff8121f4c9>] SyS_read+0x49/0xa0
[ 5375.718966]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 5375.718968]
[ 5375.718968] -> #2 (namespace_sem){+++++.}:
[ 5375.718971]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.718974]        [<ffffffff81635199>] down_write+0x49/0x80
[ 5375.718977]        [<ffffffff81243593>] lock_mount+0x43/0x1c0
[ 5375.718979]        [<ffffffff81243c13>] do_add_mount+0x23/0xd0
[ 5375.718982]        [<ffffffff81244afb>] do_mount+0x27b/0xe30
[ 5375.718985]        [<ffffffff812459dc>] SyS_mount+0x8c/0xd0
[ 5375.718988]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 5375.718991]
[ 5375.718991] -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
[ 5375.718994]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.718996]        [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
[ 5375.719001]        [<ffffffff8122d608>] path_openat+0x468/0x1360
[ 5375.719004]        [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
[ 5375.719007]        [<ffffffff8121da7b>] do_sys_open+0x12b/0x210
[ 5375.719010]        [<ffffffff8121db7e>] SyS_open+0x1e/0x20
[ 5375.719013]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 5375.719015]
[ 5375.719015] -> #0 (sb_writers){.+.+.+}:
[ 5375.719018]        [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
[ 5375.719021]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.719026]        [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
[ 5375.719028]        [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
[ 5375.719031]        [<ffffffff81242eb4>] mnt_want_write+0x24/0x50
[ 5375.719035]        [<ffffffff8122ded2>] path_openat+0xd32/0x1360
[ 5375.719037]        [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
[ 5375.719040]        [<ffffffff8121d8a4>] file_open_name+0xe4/0x130
[ 5375.719043]        [<ffffffff8121d923>] filp_open+0x33/0x60
[ 5375.719073]        [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
[ 5375.719099]        [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
[ 5375.719123]        [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
[ 5375.719150]        [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
[ 5375.719175]        [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
[ 5375.719199]        [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
[ 5375.719222]        [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
[ 5375.719225]        [<ffffffff810a70df>] kthread+0xef/0x110
[ 5375.719229]        [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70
[ 5375.719230]
[ 5375.719230] other info that might help us debug this:
[ 5375.719230]
[ 5375.719233] Chain exists of:
[ 5375.719233]   sb_writers --> namespace_sem --> &fs_devs->device_list_mutex
[ 5375.719233]
[ 5375.719234]  Possible unsafe locking scenario:
[ 5375.719234]
[ 5375.719234]        CPU0                    CPU1
[ 5375.719235]        ----                    ----
[ 5375.719236]   lock(&fs_devs->device_list_mutex);
[ 5375.719238]                                lock(namespace_sem);
[ 5375.719239]                                lock(&fs_devs->device_list_mutex);
[ 5375.719241]   lock(sb_writers);
[ 5375.719241]
[ 5375.719241]  *** DEADLOCK ***
[ 5375.719241]
[ 5375.719243] 4 locks held by btrfs-health/4662:
[ 5375.719266]  #0:  (&fs_info->health_mutex){+.+.+.}, at: [<ffffffffa0246303>] health_kthread+0x63/0x490 [btrfs]
[ 5375.719293]  #1:  (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.+.}, at: [<ffffffffa02c6611>] btrfs_dev_replace_finishing+0x41/0x990 [btrfs]
[ 5375.719319]  #2:  (uuid_mutex){+.+.+.}, at: [<ffffffffa0282620>] btrfs_destroy_dev_replace_tgtdev+0x20/0x150 [btrfs]
[ 5375.719343]  #3:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
[ 5375.719343]
[ 5375.719343] stack backtrace:
[ 5375.719347] CPU: 2 PID: 4662 Comm: btrfs-health Not tainted 4.4.5-scst31x-debug-11+ #40
[ 5375.719348] Hardware name: Supermicro SYS-6018R-WTRT/X10DRW-iT, BIOS 1.0c 01/07/2015
[ 5375.719352]  0000000000000000 ffff880856f73880 ffffffff813529e3 ffffffff826182a0
[ 5375.719354]  ffffffff8260c090 ffff880856f738c0 ffffffff810d667c ffff880856f73930
[ 5375.719357]  ffff880861f32b40 ffff880861f32b68 0000000000000003 0000000000000004
[ 5375.719357] Call Trace:
[ 5375.719363]  [<ffffffff813529e3>] dump_stack+0x85/0xc2
[ 5375.719366]  [<ffffffff810d667c>] print_circular_bug+0x1ec/0x260
[ 5375.719369]  [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
[ 5375.719373]  [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 5375.719376]  [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
[ 5375.719378]  [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
[ 5375.719383]  [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
[ 5375.719385]  [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
[ 5375.719387]  [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
[ 5375.719389]  [<ffffffff81242eb4>] mnt_want_write+0x24/0x50
[ 5375.719393]  [<ffffffff8122ded2>] path_openat+0xd32/0x1360
[ 5375.719415]  [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
[ 5375.719418]  [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 5375.719420]  [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
[ 5375.719423]  [<ffffffff810f615d>] ? rcu_read_lock_sched_held+0x6d/0x80
[ 5375.719426]  [<ffffffff81201a9b>] ? kmem_cache_alloc+0x26b/0x5d0
[ 5375.719430]  [<ffffffff8122e7d4>] ? getname_kernel+0x34/0x120
[ 5375.719433]  [<ffffffff8121d8a4>] file_open_name+0xe4/0x130
[ 5375.719436]  [<ffffffff8121d923>] filp_open+0x33/0x60
[ 5375.719462]  [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
[ 5375.719485]  [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
[ 5375.719506]  [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
[ 5375.719530]  [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
[ 5375.719554]  [<ffffffffa02c6b23>] ? btrfs_dev_replace_finishing+0x553/0x990 [btrfs]
[ 5375.719576]  [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
[ 5375.719598]  [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
[ 5375.719621]  [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
[ 5375.719641]  [<ffffffffa02463d8>] ? health_kthread+0x138/0x490 [btrfs]
[ 5375.719661]  [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
[ 5375.719663]  [<ffffffff810a70df>] kthread+0xef/0x110
[ 5375.719666]  [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
[ 5375.719669]  [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70
[ 5375.719672]  [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
[ 5375.719697] ------------[ cut here ]------------

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06 15:22:49 +02:00
Austin S. Hemmelgarn
88be159c90 btrfs: allow balancing to dup with multi-device
Currently, we don't allow the user to try and rebalance to a dup profile
on a multi-device filesystem.  In most cases, this is a perfectly sensible
restriction as raid1 uses the same amount of space and provides better
protection.

However, when reshaping a multi-device filesystem down to a single device
filesystem, this requires the user to convert metadata and system chunks
to single profile before deleting devices, and then convert again to dup,
which leaves a period of time where metadata integrity is reduced.

This patch removes the single-device-only restriction from converting to
dup profile to remove this potential data integrity reduction.

Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-06 15:22:49 +02:00
Anand Jain
88acff64c6 btrfs: cleanup assigning next active device with a check
Creates helper fucntion as needed by the device delete
and replace operations. Also now it checks if the next
device being assigned is an active device.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-04 10:41:08 +02:00
Anand Jain
8ed01abe7d btrfs: s_bdev is not null after missing replace
Yauhen reported in the ML that s_bdev is null at mount, and
s_bdev gets updated to some device when missing device is
replaced, as because bdev is null for missing device, things
gets matched up. Fix this by checking if s_bdev is set. I
didn't want to completely remove updating s_bdev because
the future multi device support at vfs layer may need it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-04 09:52:44 +02:00
David Sterba
5c5c0df05d btrfs: rename btrfs_find_device_by_user_input
For clarity how we are going to find the device, let's call it a device
specifier, devspec for short. Also rename the arguments that are a
leftover from previous function purpose.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
418775a22b btrfs: use existing device constraints table btrfs_raid_array
We should avoid duplicating the device constraints, let's use the
btrfs_raid_array in btrfs_check_raid_min_devices.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
621292bae6 btrfs: introduce raid-type to error-code table, for minimum device constraint
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
3cc31a0d5b btrfs: pass number of devices to btrfs_check_raid_min_devices
Before this patch, btrfs_check_raid_min_devices would do an off-by-one
check of the constraints and not the miminmum check, as its name
suggests. This is not a problem if the only caller is device remove, but
would be confusing for others.

Add an argument with the exact number and let the caller(s) decide if
this needs any adjustments, like when device replace is running.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
f47ab2588e btrfs: rename __check_raid_min_devices
Underscores are for special functions, use the full prefix for better
stacktrace recognition.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
02feae3c55 btrfs: optimize check for stale device
Optimize check for stale device to only be checked when there is device
added or changed. If there is no update to the device, there is no need
to call btrfs_free_stale_device().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
6b526ed70c btrfs: introduce device delete by devid
This introduces new ioctl BTRFS_IOC_RM_DEV_V2, which uses enhanced struct
btrfs_ioctl_vol_args_v2 to carry devid as an user argument.

The patch won't delete the old ioctl interface and so kernel remains
backward compatible with user land progs.

Test case/script:
echo "0 $(blockdev --getsz /dev/sdf) linear /dev/sdf 0" | dmsetup create bad_disk
mkfs.btrfs -f -d raid1 -m raid1 /dev/sdd /dev/sde /dev/mapper/bad_disk
mount /dev/sdd /btrfs
dmsetup suspend bad_disk
echo "0 $(blockdev --getsz /dev/sdf) error /dev/sdf 0" | dmsetup load bad_disk
dmsetup resume bad_disk
echo "bad disk failed. now deleting/replacing"
btrfs dev del  3  /btrfs
echo $?
btrfs fi show /btrfs
umount /btrfs
btrfs-show-super /dev/sdd | egrep num_device
dmsetup remove bad_disk
wipefs -a /dev/sdf

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Martin <m_btrfs@ml1.co.uk>
[ adjust messages, s/disk/device/ ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
42b6742715 btrfs: make use of btrfs_scratch_superblocks() in btrfs_rm_device()
With the previous patches now the btrfs_scratch_superblocks() is ready to
be used in btrfs_rm_device() so use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ use GFP_KERNEL ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
b3d1b1532f btrfs: enhance btrfs_find_device_by_user_input() to check device path
The operation of device replace and device delete follows same steps upto
some depth with in btrfs kernel, however they don't share codes. This
enhancement will help replace and delete to share codes.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
24fc572fe4 btrfs: make use of btrfs_find_device_by_user_input()
btrfs_rm_device() has a section of the code which can be replaced
btrfs_find_device_by_user_input()

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
24e0474b59 btrfs: create helper btrfs_find_device_by_user_input()
The patch renames btrfs_dev_replace_find_srcdev() to
btrfs_find_device_by_user_input() and moves it to volumes.c, so that
delete device can use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00