Commit graph

754080 commits

Author SHA1 Message Date
Nikolay Borisov
719fb4de55 btrfs: Remove fs_info argument from convert_free_space_to_bitmaps
This function already takes a transaction handle which contains a
reference to fs_info. So use that and remove the extra argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:34 +02:00
Nikolay Borisov
f3f7277995 btrfs: Remove fs_info parameter from remove_block_group_free_space
This function always takes a trans handle which contains a reference to
the fs_info. Use that and remove the extra argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:34 +02:00
Nikolay Borisov
4457c1c702 btrfs: Remove fs_info argument from add_new_free_space
This function also takes a btrfs_block_group_cache which contains a
referene to the fs_info. So use that and remove the extra argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:33 +02:00
Nikolay Borisov
66afee1848 btrfs: Remove fs_info parameter from add_new_free_space_info
This function already takes trans handle from where fs_info can be
referenced. Remove the redundant parameter.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:33 +02:00
Nikolay Borisov
2d5cffa1b0 btrfs: Remove fs_info argument from __add_to_free_space_tree
This function already takes a transaction handle which contains a
reference to fs_info.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:33 +02:00
Nikolay Borisov
9a7e0f9284 btrfs: Remove fs_info argument from __add_block_group_free_space
This function already takes a transaction handle which has a reference
to the fs_info.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:33 +02:00
Nikolay Borisov
e4e0711cd9 btrfs: Remove fs_info argument from add_block_group_free_space
We also pass in a transaction handle which has a reference to the
fs_info. Just remove the extraneous argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:33 +02:00
Nikolay Borisov
483bce068e btrfs: Make btrfs_init_dummy_trans initialize trans' fs_info field
This will be necessary for future cleanups which remove the fs_info
argument from some freespace tree functions.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:32 +02:00
Nikolay Borisov
7c8a0d363a btrfs: Add assert in __btrfs_del_delalloc_inode
The invariant is that when nr_delalloc_inodes is 0 then the root
mustn't have any inodes on its delalloc inodes list.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:32 +02:00
Robbie Ko
0f96f517dc btrfs: incremental send, improve rmdir performance for large directory
Currently when checking if a directory can be deleted, we always check
if all its children have been processed.

Example: A directory with 2,000,000 files was deleted

original: 1994m57.071s
patch:       1m38.554s

[FIX]
Instead of checking all children on all calls to can_rmdir(), we keep
track of the directory index offset of the child last checked in the
last call to can_rmdir(), and then use it as the starting point for
future calls to can_rmdir().

Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:32 +02:00
Robbie Ko
35c8eda12f btrfs: incremental send, move allocation until it's needed in orphan_dir_info
Move the allocation after the search when it's clear that the new entry
will be added.

Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:32 +02:00
Nikolay Borisov
2335efafa6 btrfs: split delayed ref head initialization and addition
add_delayed_ref_head really performed 2 independent operations -
initialisting the ref head and adding it to a list. Now that the init
part is in a separate function let's complete the separation between
both operations. This results in a lot simpler interface for
add_delayed_ref_head since the function now deals solely with either
adding the newly initialised delayed ref head or merging it into an
existing delayed ref head. This results in vastly simplified function
signature since 5 arguments are dropped. The only other thing worth
mentioning is that due to this split the WARN_ON catching reinit of
existing. In this patch the condition is extended such that:

  qrecord && head_ref->qgroup_ref_root && head_ref->qgroup_reserved

is added. This is done because the two qgroup_* prefixed member are
set only if both ref_root and reserved are passed. So functionally
it's equivalent to the old WARN_ON and allows to remove the two args
from add_delayed_ref_head.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:32 +02:00
Nikolay Borisov
eb86ec73b9 btrfs: Use init_delayed_ref_head in add_delayed_ref_head
Use the newly introduced function when initialising the head_ref in
add_delayed_ref_head. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:31 +02:00
Nikolay Borisov
a2e569b3f2 btrfs: Introduce init_delayed_ref_head
add_delayed_ref_head implements the logic to both initialize a head_ref
structure as well as perform the necessary operations to add it to the
delayed ref machinery. This has resulted in a very cumebrsome interface
with loads of parameters and code, which at first glance, looks very
unwieldy. Begin untangling it by first extracting the initialization
only code in its own function. It's more or less verbatim copy of the
first part of add_delayed_ref_head.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:31 +02:00
Nikolay Borisov
cd7f9699b1 btrfs: Open-code add_delayed_data_ref
Now that the initialization part and the critical section code have been
split it's a lot easier to open code add_delayed_data_ref. Do so in the
following manner:

1. The common init function is put immediately after memory-to-be-initialized
   is allocated, followed by the specific data ref initialization.

2. The only piece of code that remains in the critical section is
   insert_delayed_ref call.

3. Tracing and memory freeing code is moved outside of the critical
   section.

No functional changes, just an overall shorter critical section.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:31 +02:00
Nikolay Borisov
70d640004a btrfs: Open-code add_delayed_tree_ref
Now that the initialization part and the critical section code have been
split it's a lot easier to open code add_delayed_tree_ref. Do so in the
following manner:

1. The comming init code is put immediately after memory-to-be-initialized
   is allocated, followed by the ref-specific member initialization.

2. The only piece of code that remains in the critical section is
   insert_delayed_ref call.

3. Tracing and memory freeing code is put outside of the critical
   section as well.

The only real change here is an overall shorter critical section when
dealing with delayed tree refs. From functional point of view - the code
is unchanged.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:31 +02:00
Nikolay Borisov
c812c8a857 btrfs: Use init_delayed_ref_common in add_delayed_data_ref
Use the newly introduced helper and remove the duplicate code.  No
functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:31 +02:00
Nikolay Borisov
646f4dd76f btrfs: Use init_delayed_ref_common in add_delayed_tree_ref
Use the newly introduced common helper.  No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:30 +02:00
Nikolay Borisov
cb49a87b2a btrfs: Factor out common delayed refs init code
THe majority of the init code for struct btrfs_delayed_ref_node is
duplicated in add_delayed_data_ref and add_delayed_tree_ref. Factor out
the common bits in init_delayed_ref_common. This function is going to be
used in future patches to clean that up. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:30 +02:00
Chengguang Xu
891f41cb27 btrfs: return original error code when failing from option parsing
It's not good to overwrite -ENOMEM using -EINVAL when failing from mount
option parsing, so just return original error code.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:30 +02:00
David Sterba
6fcf6e2bff btrfs: remove redundant btrfs_balance_control::fs_info
The fs_info is always available from the context so we don't need to
store it in the structure.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:30 +02:00
Qu Wenruo
c9f6f3cd1c btrfs: qgroup: Allow trace_btrfs_qgroup_account_extent() to record its transid
When debugging quota rescan race, some times btrfs rescan could account
some old (committed) leaf and then re-account newly committed leaf
in next generation.

This race needs extra transid to locate, so add @transid for
trace_btrfs_qgroup_account_extent() for such debug.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:30 +02:00
Colin Ian King
f5686e3acd btrfs: send: fix spelling mistake: "send_in_progres" -> "send_in_progress"
Trivial fix to spelling mistake of function name in btrfs_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:29 +02:00
Nikolay Borisov
63a9c7b9ce btrfs: Remove devid parameter from btrfs_rmap_block
This function is used in only one place and devid argument is always
passed 0. So just remove it, similarly to how it was removed in the
userspace code.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:29 +02:00
Qu Wenruo
8b317901da btrfs: trace: Allow trace_qgroup_update_counters() to record old rfer/excl value
Origin trace_qgroup_update_counters() only records qgroup id and its
reference count change.

It's good enough to debug qgroup accounting change, but when rescan race
is involved, it's pretty hard to distinguish which modification belongs
to which rescan.

So add old_rfer and old_excl trace output to help distinguishing
different rescan instance.
(Different rescan instance should reset its qgroup->rfer to 0)

For trace event parameter, it just changes from u64 qgroup_id to struct
btrfs_qgroup *qgroup, so number of parameters is not changed at all.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:29 +02:00
Nikolay Borisov
3a2f8c07e1 btrfs: Unexport btrfs_alloc_delalloc_work
It's used only in inode.c so makes no sense to have it exported. Also
move the definition of btrfs_delalloc_work to inode.c since it's used
only this file.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:29 +02:00
Nikolay Borisov
076da91cd9 btrfs: Remove delayed_iput member from btrfs_delalloc_work
When allocating a delalloc work we are always setting the delayed_iput
to 0. So remove the delay_iput member of btrfs_delalloc_work, as a
result also remove it as a parameter from btrfs_alloc_delalloc_work
since it's not used anymore.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:29 +02:00
Nikolay Borisov
4fbb514785 btrfs: Remove delay_iput parameter from __start_delalloc_inodes
It's always set to 0 so remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ rename to start_delalloc_inodes ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:28 +02:00
Nikolay Borisov
76f32e240e btrfs: Remove delayed_iput parameter from btrfs_start_delalloc_inodes
It's always set to 0, so just remove it and collapse the constant value
to the only function we are passing it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:28 +02:00
Nikolay Borisov
82b3e53b8d btrfs: Remove delayed_iput parameter of btrfs_start_delalloc_roots
This parameter was introduced alongside the function in
eb73c1b7ce ("Btrfs: introduce per-subvolume delalloc inode list") to
avoid deadlocks since this function was used in the transaction commit
path. However, commit 8d875f95da ("btrfs: disable strict file flushes
for renames and truncates") removed that usage, rendering the parameter
obsolete.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:28 +02:00
Gu Jinxiang
0338dff6e0 btrfs: do reverse path readahead in btrfs_shrink_device
In btrfs_shrink_device, before btrfs_search_slot, path->reada is set to
READA_FORWARD. But I think READA_BACK is correct.

Since:

 1. key.offset is set to (u64)-1
 2. after btrfs_search_slot, btrfs_previous_item is called

So, for readahead previous items, READA_BACK is the correct one.

Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:28 +02:00
Qu Wenruo
4ed0a7a3b7 btrfs: trace: Add trace points for unused block groups
This patch will add the following trace events:
1) btrfs_remove_block_group
   For btrfs_remove_block_group() function.
   Triggered when a block group is really removed.

2) btrfs_add_unused_block_group
   Triggered which block group is added to unused_bgs list.

3) btrfs_skip_unused_block_group
   Triggered which unused block group is not deleted.

These trace events is pretty handy to debug case related to block group
auto remove.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:28 +02:00
Qu Wenruo
3dca5c942d btrfs: trace: Remove unnecessary fs_info parameter for btrfs__reserve_extent event class
fs_info can be extracted from btrfs_block_group_cache, and all
btrfs_block_group_cache is created by btrfs_create_block_group_cache()
with fs_info initialized, no need to worry about NULL pointer
dereference.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:27 +02:00
Gu Jinxiang
9113493e3a btrfs: remove unused fs_info parameter
Since the commit c6100a4b4e ("Btrfs: replace tree->mapping with
tree->private_data"), parameter fs_info in alloc_reloc_control is
not used. So remove it.

Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:27 +02:00
Anand Jain
f9fbcaa2a3 btrfs: move btrfs_raid_mindev_errorvalues to btrfs_raid_attr table
Add a new member struct btrfs_raid_attr::mindev_error so that
btrfs_raid_array can maintain the error code to return if the minimum
number of devices condition is not met while trying to delete a device
in the given raid. And so we can drop btrfs_raid_mindev_error.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:27 +02:00
Anand Jain
41a6e8913c btrfs: move btrfs_raid_group values to btrfs_raid_attr table
Add a new member struct btrfs_raid_attr::bg_flag so that
btrfs_raid_array can maintain the bit map flag of the raid type, and
so we can drop btrfs_raid_group.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:27 +02:00
Anand Jain
ed23467b18 btrfs: move btrfs_raid_type_names values to btrfs_raid_attr table
Add a new member struct btrfs_raid_attr::raid_name so that
btrfs_raid_array can maintain the name of the raid type, and so we can
drop btrfs_raid_type_names.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:27 +02:00
Qu Wenruo
b545993694 btrfs: print-tree: Add eb locking status output for debug build
It's pretty handy if we can get the debug output for locking status of
an extent buffer, specially for race condition related debugging.

So add the following output for btrfs_print_tree() and
btrfs_print_leaf():
- refs
- write_locks (as w:%d)
- read_locks (as r:%d)
- blocking_writers (as bw:%d)
- blocking_readers (as br:%d)
- spinning_writers (as sw:%d)
- spinning_readers (as sr:%d)
- lock_owner
- current->pid

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:26 +02:00
David Sterba
833aae18fc btrfs: open code set_balance_control
The helper is quite simple and I'd like to see the locking in the
caller.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:26 +02:00
David Sterba
1354e1a13e btrfs: use mutex in btrfs_resume_balance_async
While the spinlock does not cause problems, using the mutex is more
correct and consistent with others. The global status of balance is eg.
checked from btrfs_pause_balance or btrfs_cancel_balance with mutex.

Resuming balance happens during mount or ro->rw remount. In the former
case, no other user of the balance_ctl exists, in the latter, balance
cannot run until the ro/rw transition is finished.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:26 +02:00
David Sterba
008ef0969d btrfs: drop lock parameter from update_ioctl_balance_args and rename
The parameter controls locking of the stats part but we can lock it
unconditionally, as this only happens once when balance starts. This is
not performance critical.

Add the prefix for an exported function.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:26 +02:00
David Sterba
cf7d20f447 btrfs: move and comment read-only check in btrfs_cancel_balance
Balance cannot be started on a read-only filesystem and will have to
finish/exit before eg. going to read-only via remount.

In case the filesystem is forcibly set to read-only after an error,
balance will finish anyway and if the cancel call is too fast it will
just wait for that to happen.

The last case is when the balance is paused after mount but it's
read-only and cancelling would want to delete the item. The test is
moved after the check if balance is running at all, as it looks more
logical to report "no balance running" instead of "read-only
filesystem".

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:26 +02:00
David Sterba
3009a62f3b btrfs: track running balance in a simpler way
Currently fs_info::balance_running is 0 or 1 and does not use the
semantics of atomics. The pause and cancel check for 0, that can happen
only after __btrfs_balance exits for whatever reason.

Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
times but will block on the balance_mutex that protects the
fs_info::flags bit.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:25 +02:00
David Sterba
dccdb07bc9 btrfs: kill btrfs_fs_info::volume_mutex
Mutual exclusion of device add/rm and balance was done by the volume
mutex up to version 3.7. The commit 5ac00addc7 ("Btrfs: disallow
mutually exclusive admin operations from user mode") added a bit that
essentially tracked the same information.

The status bit has an advantage over a mutex that it can be set without
restrictions of function context, so it started to be used in the
mount-time resuming of balance or device replace.

But we don't really need to track the same information in two ways.

1) After the previous cleanups, the main ioctl handlers for
   add/del/resize copy the EXCL_OP bit next to the volume mutex, here
   it's clearly safe.

2) Resuming balance during mount or after rw remount will set only the
   EXCL_OP bit and the volume_mutex is held in the kernel thread that
   calls btrfs_balance.

3) Resuming device replace during mount or after rw remount is done
   after balance and is excluded by the EXCL_OP bit. It does not take
   the volume_mutex at all and completely relies on the EXCL_OP bit.

4) The resuming of balance and dev-replace cannot hapen at the same time
   as the ioctls cannot be started in parallel. Nevertheless, a crafted
   image could trigger that and a warning is printed.

5) Balance is normally excluded by EXCL_OP and also uses own mutex to
   protect against concurrent access to its status data. There's some
   trickery to maintain the right lock nesting in case we need to
   reexamine the status in btrfs_ioctl_balance. The volume_mutex is
   removed and the unlock/lock sequence is left in place as we might
   expect other waiters to proceed.

6) Similar to 5, the unlock/lock sequence is kept in
   btrfs_cancel_balance to allow waiters to continue.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:25 +02:00
David Sterba
a0fecc2371 btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start
The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:25 +02:00
David Sterba
149196a2ae btrfs: cleanup helpers that reset balance state
The function __cancel_balance name is confusing with the cancel
operation of balance and it really resets the state of balance back to
zero. The unset_balance_control helper is called only from one place and
simple enough to be inlined.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:25 +02:00
David Sterba
eee95e3fb0 btrfs: add sanity check when resuming balance after mount
Replace a WARN_ON with a proper check and message in case something goes
really wrong and resumed balance cannot set up its exclusive status.
The check is a user friendly assertion, I don't expect to ever happen
under normal circumstances.

Also document that the paused balance starts here and owns the exclusive
op status.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:25 +02:00
David Sterba
010a47bde9 btrfs: add proper safety check before resuming dev-replace
The device replace is paused by unmount or read only remount, and
resumed on next mount or write remount.

The exclusive status should be checked properly as it's a global
invariant and we must not allow 2 operations run. In this case, the
balance can be also paused and resumed under same conditions. It's
always checked first so dev-replace could see the EXCL_OP already taken,
BUT, the ioctl would never let start both at the same time.

Replace the WARN_ON with message and return 0, indicating no error as
this is purely theoretical and the user will be informed. Resolving that
manually should be possible by waiting for the other operation to finish
or cancel the paused state.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:24 +02:00
David Sterba
a17c95df4c btrfs: move clearing of EXCL_OP out of __cancel_balance
Make the clearning visible in the callers so we can pair it with the
test_and_set part.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:24 +02:00
David Sterba
72b81abf95 btrfs: move volume_mutex to callers of btrfs_rm_device
Move locking and unlocking next to the BTRFS_FS_EXCL_OP bit manipulation
so it's obvious that the two happen at the same time.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:24 +02:00