Commit graph

969 commits

Author SHA1 Message Date
David Sterba
158da513b1 btrfs: refactor helper for bg flags to name conversion
The helper lacks the btrfs_ prefix and the parameter is the raw
blockgroup type, so none of the callers has to do the flags -> index
conversion.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:58 +02:00
David Sterba
e3ecdb3fde btrfs: factor out devs_max setting in __btrfs_alloc_chunk
Merge the repeated code before the if-else block.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
946c9256c6 btrfs: factor out helper for counting data stripes
Factor the sequence of ifs to a helper, the 'data stripes' here means
the number of stripes without redundancy and parity.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
44b28adafd btrfs: use raid_attr table for btrfs_bg_type_to_factor
The factor is the number of copies.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
6079e12cdb btrfs: use raid_attr table to find profiles for integrity lowering
Replace open coded list of the profiles by selecting them from the
raid_attr table. The criteria are now more explicit, we need profiles
that have more than 1 copy of the data or can reconstruct the data with
a missing device.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
081db89b13 btrfs: use raid_attr to get allowed profiles for balance conversion
Iterate over the table and gather all allowed profiles for a given
number of devices, instead of open coding.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
fc9a2ac77c btrfs: use raid_attr in btrfs_chunk_max_errors
The number of tolerated failures is stored in the raid_attr table, use
it.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
c8bf1b6703 btrfs: remove mapping tree structures indirection
fs_info::mapping_tree is the physical<->logical mapping tree and uses
the same underlying structure as extents, but is embedded to another
structure. There are no other members and this indirection is useless.
No functional change.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
49cc180ca9 btrfs: raid56: allow the exact minimum number of devices for balance convert
The minimum number of devices for RAID5 is 2, though this is only a bit
expensive RAID1, and for RAID6 it's 3, which is a triple copy that works
only 3 devices.

mkfs.btrfs allows that and mounting such filesystem also works, so the
conversion via balance filters is inconsistent with the others and we
should not prevent it.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
0ee5f8ae08 btrfs: fix minimum number of chunk errors for DUP
The list of profiles in btrfs_chunk_max_errors lists DUP as a profile
DUP able to tolerate 1 device missing. Though this profile is special
with 2 copies, it still needs the device, unlike the others.

Looking at the history of changes, thre's no clear reason why DUP is
there, functions were refactored and blocks of code merged to one
helper.

d20983b40e Btrfs: fix writing data into the seed filesystem
  - factor code to a helper

de11cc12df Btrfs: don't pre-allocate btrfs bio
  - unrelated change, DUP still in the list with max errors 1

a236aed14c Btrfs: Deal with failed writes in mirrored configurations
  - introduced the max errors, leaves DUP and RAID1 in the same group

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
David Sterba
65237ee3b6 btrfs: get fs_info from device in btrfs_rm_dev_replace_free_srcdev
We can read fs_info from the device and can drop it from the parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:48 +02:00
David Sterba
163e97ee0d btrfs: get fs_info from device in btrfs_scrub_cancel_dev
We can read fs_info from the device and can drop it from the parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:47 +02:00
David Sterba
f331a9525f btrfs: get fs_info from device in btrfs_rm_dev_item
We can read fs_info from the device and can drop it from the parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:47 +02:00
David Sterba
196c9d8de8 btrfs: get fs_info from trans in btrfs_run_dev_stats
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:43 +02:00
David Sterba
5c466629e2 btrfs: get fs_info from trans in btrfs_finish_sprout
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:42 +02:00
David Sterba
6f8e0fc77c btrfs: get fs_info from trans in init_first_rw_device
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:42 +02:00
David Sterba
9b7a2440ae btrfs: get fs_info from trans in btrfs_create_tree
We can read fs_info from the transaction and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:41 +02:00
David Sterba
178507595c btrfs: get fs_info from eb in read_one_dev
We can read fs_info from extent buffer and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:39 +02:00
David Sterba
9690ac0987 btrfs: get fs_info from eb in read_one_chunk
We can read fs_info from extent buffer and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:39 +02:00
David Sterba
ddaf1d5aef btrfs: get fs_info from eb in btrfs_check_chunk_valid
We can read fs_info from extent buffer and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:39 +02:00
David Sterba
6ec0896c4c btrfs: get fs_info from eb in should_balance_chunk
We can read fs_info from extent buffer and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:39 +02:00
Nikolay Borisov
e74e3993bc btrfs: Factor out in_range macro
This is used in more than one places so let's factor it out in ctree.h.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:37 +02:00
Nikolay Borisov
60dfdf25bd btrfs: Remove 'trans' argument from find_free_dev_extent(_start)
Now that these functions no longer require a handle to transaction to
inspect pending/pinned chunks the argument can be removed. At the same
time also remove any surrounding code which acquired the handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:37 +02:00
Jeff Mahoney
1c11b63eff btrfs: replace pending/pinned chunks lists with io tree
The pending chunks list contains chunks that are allocated in the
current transaction but haven't been created yet. The pinned chunks
list contains chunks that are being released in the current transaction.
Both describe chunks that are not reflected on disk as in use but are
unavailable just the same.

The pending chunks list is anchored by the transaction handle, which
means that we need to hold a reference to a transaction when working
with the list.

The way we use them is by iterating over both lists to perform
comparisons on the stripes they describe for each device. This is
backwards and requires that we keep a transaction handle open while
we're trimming.

This patchset adds an extent_io_tree to btrfs_device that maintains
the allocation state of the device.  Extents are set dirty when
chunks are first allocated -- when the extent maps are added to the
mapping tree. They're cleared when last removed -- when the extent
maps are removed from the mapping tree. This matches the lifespan
of the pending and pinned chunks list and allows us to do trims
on unallocated space safely without pinning the transaction for what
may be a lengthy operation. We can also use this io tree to mark
which chunks have already been trimmed so we don't repeat the operation.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:37 +02:00
Nikolay Borisov
8e75fd893b btrfs: Stop using call_rcu for device freeing
btrfs_device structs are freed from RCU context since device iteration
is protected by RCU. Currently this is achieved by using call_rcu since
no blocking functions are called within btrfs_free_device. Future
refactoring of pending/pinned chunks will require calling sleeping
functions.

This patch is in preparation for these changes by simply switching from
RCU callbacks to explicit calls of synchronize_rcu and calling
btrfs_free_device directly. This is functionally equivalent, making sure
that there are no readers at that time.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:37 +02:00
Nikolay Borisov
39e264a40d btrfs: Populate ->orig_block_len during read_one_chunk
Chunks read from disk currently don't get their ->orig_block_len member
set, in contrast when a new chunk is allocated, the respective
extent_map's ->orig_block_len is assigned the size of the stripe of this
chunk.

Let's apply the same strategy for chunks which are read from
disk, not only does this codify the invariant that ->orig_block_len
always contains the size of the stripe for a chunk (when the em belongs
to the mapping tree). But it's also a preparatory patch for further work
around tracking chunk allocation in an extent tree rather than
pinned/pending lists.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:36 +02:00
Nikolay Borisov
61d0d0d2cb btrfs: Handle pending/pinned chunks before blockgroup relocation during device shrink
During device shrink pinned/pending chunks (i.e. those which have been
deleted/created respectively, in the current transaction and haven't
touched disk) need to be accounted when doing device shrink. Presently
this happens after the main relocation loop in btrfs_shrink_device,
which could lead to making another go in the body of the function.

Since there is no hard requirement to perform pinned/pending chunks
handling after the relocation loop, move the code before it. This leads
to simplifying the code flow around - i.e. no need to use 'goto again'.

A notable side effect of this change is that modification of the
device's size requires a transaction to be started and committed before
the relocation loop starts. This is necessary to ensure that relocation
process sees the shrunk device size.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:36 +02:00
Nikolay Borisov
bbbf7243d6 btrfs: combine device update operations during transaction commit
We currently overload the pending_chunks list to handle updating
btrfs_device->commit_bytes used.  We don't actually care about the
extent mapping or even the device mapping for the chunk - we just need
the device, and we can end up processing it multiple times.  The
fs_devices->resized_list does more or less the same thing, but with the
disk size.  They are called consecutively during commit and have more or
less the same purpose.

We can combine the two lists into a single list that attaches to the
transaction and contains a list of devices that need updating.  Since we
always add the device to a list when we change bytes_used or
disk_total_size, there's no harm in copying both values at once.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:36 +02:00
Qu Wenruo
ab4ba2e133 btrfs: tree-checker: Verify dev item
[BUG]
For fuzzed image whose DEV_ITEM has invalid total_bytes as 0, then
kernel will just panic:
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
  #PF error: [normal kernel read fault]
  PGD 800000022b2bd067 P4D 800000022b2bd067 PUD 22b2bc067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 0 PID: 1106 Comm: mount Not tainted 5.0.0-rc8+ #9
  RIP: 0010:btrfs_verify_dev_extents+0x2a5/0x5a0
  Call Trace:
   open_ctree+0x160d/0x2149
   btrfs_mount_root+0x5b2/0x680

[CAUSE]
If device extent verification finds a deivce with 0 total_bytes, then it
assumes it's a seed dummy, then search for seed devices.

But in this case, there is no seed device at all, causing NULL pointer.

[FIX]
Since this is caused by fuzzed image, let's go the tree-check way, just
add a new verification for device item.

Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202691
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:32 +02:00
Qu Wenruo
075cb3c78f btrfs: tree-checker: Check chunk item at tree block read time
Since we have btrfs_check_chunk_valid() in tree-checker, let's do
chunk item verification in tree-checker too.

Since the tree-checker is run at endio time, if one chunk leaf fails
chunk verification, we can still retry the other copy, making btrfs more
robust to fuzzed image as we may still get a good chunk item.

Also since we have done chunk verification in tree block read time, skip
the btrfs_check_chunk_valid() call in read_one_chunk() if we're reading
chunk items from leaf.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:32 +02:00
Qu Wenruo
82fc28fbed btrfs: Move btrfs_check_chunk_valid() to tree-check.[ch] and export it
By function, chunk item verification is more suitable to be done inside
tree-checker.

So move btrfs_check_chunk_valid() to tree-checker.c and export it.

And since it's now moved to tree-checker, also add a better comment for
what this function is doing.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:31 +02:00
Arnd Bergmann
290342f661 btrfs: use BUG() instead of BUG_ON(1)
BUG_ON(1) leads to bogus warnings from clang when
CONFIG_PROFILE_ANNOTATED_BRANCHES is set:

fs/btrfs/volumes.c:5041:3: error: variable 'max_chunk_size' is used uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
                BUG_ON(1);
                ^~~~~~~~~
include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
 #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                   ^~~~~~~~~~~~~~~~~~~
include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
 #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/volumes.c:5046:9: note: uninitialized use occurs here
                             max_chunk_size);
                             ^~~~~~~~~~~~~~
include/linux/kernel.h:860:36: note: expanded from macro 'min'
 #define min(x, y)       __careful_cmp(x, y, <)
                                         ^
include/linux/kernel.h:853:17: note: expanded from macro '__careful_cmp'
                __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
                              ^
include/linux/kernel.h:847:25: note: expanded from macro '__cmp_once'
                typeof(y) unique_y = (y);               \
                                      ^
fs/btrfs/volumes.c:5041:3: note: remove the 'if' if its condition is always true
                BUG_ON(1);
                ^
include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
 #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                               ^
fs/btrfs/volumes.c:4993:20: note: initialize the variable 'max_chunk_size' to silence this warning
        u64 max_chunk_size;
                          ^
                           = 0

Change it to BUG() so clang can see that this code path can never
continue.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:28 +02:00
David Sterba
0cc068e6ee btrfs: don't report readahead errors and don't update statistics
As readahead is an optimization, all errors are usually filtered out,
but still properly handled when the real read call is done. The commit
5e9d398240 ("btrfs: readpages() should submit IO as read-ahead") added
REQ_RAHEAD to readpages() because that's only used for readahead
(despite what one would expect from the callback name).

This causes a flood of messages and inflated read error stats, so skip
reporting in case it's readahead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202403
Reported-by: LimeTech <tomm@lime-technology.com>
Fixes: 5e9d398240 ("btrfs: readpages() should submit IO as read-ahead")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-13 17:14:27 +01:00
Johannes Thumshirn
349ae63f40 btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
We recently had a customer issue with a corrupted filesystem. When
trying to mount this image btrfs panicked with a division by zero in
calc_stripe_length().

The corrupt chunk had a 'num_stripes' value of 1. calc_stripe_length()
takes this value and divides it by the number of copies the RAID profile
is expected to have to calculate the amount of data stripes. As a DUP
profile is expected to have 2 copies this division resulted in 1/2 = 0.
Later then the 'data_stripes' variable is used as a divisor in the
stripe length calculation which results in a division by 0 and thus a
kernel panic.

When encountering a filesystem with a DUP block group and a
'num_stripes' value unequal to 2, refuse mounting as the image is
corrupted and will lead to unexpected behaviour.

Code inspection showed a RAID1 block group has the same issues.

Fixes: e06cd3dd7c ("Btrfs: add validadtion checks for chunk loading")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:41 +01:00
Anand Jain
7faad6e25c btrfs: fix comment its device list mutex not volume lock
We have killed volume mutex (commit: dccdb07bc9
btrfs: kill btrfs_fs_info::volume_mutex). This a trival one seems to have
escaped.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:37 +01:00
Anand Jain
228a73abde btrfs: introduce new ioctl to unregister a btrfs device
Support for a new command that can be used eg. as a command

  $ btrfs device scan --forget [dev]'
(the final name may change though)

to undo the effects of 'btrfs device scan [dev]'. For this purpose
this patch proposes to use ioctl #5 as it was empty and is next to the
SCAN ioctl.

The new ioctl BTRFS_IOC_FORGET_DEV works only on the control device
(/dev/btrfs-control) to unregister one or all devices, devices that are
not mounted.

The argument is struct btrfs_ioctl_vol_args, ::name specifies the device
path. To unregister all device, the path is an empty string.

Again, the devices are removed only if they aren't part of a mounte
filesystem.

This new ioctl provides:

- release of unwanted btrfs_fs_devices and btrfs_devices structures
  from memory if the device is not going to be mounted

- ability to mount filesystem in degraded mode, when one devices is
  corrupted like in split brain raid1

- running test cases which would require reloading the kernel module
  but this is not possible eg. due to mounted filesystem or built-in

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:30 +01:00
Anand Jain
09ba3bc9dd btrfs: merge btrfs_find_device and find_device
Both btrfs_find_device() and find_device() does the same thing except
that the latter does not take the seed device onto account in the device
scanning context. We can merge them.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:24 +01:00
Anand Jain
70bc7088aa btrfs: refactor btrfs_free_stale_devices() to get return value
Preparatory patch to add ioctl that allows to forget a device (ie.
reverse of scan).

Refactors btrfs_free_stale_devices() to obtain return status. As this
function can fail if it can't find the given path (returns -ENOENT) or
trying to delete a mounted device (returns -EBUSY).

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:23 +01:00
Anand Jain
e4319cd9ca btrfs: refactor btrfs_find_device() take fs_devices as argument
btrfs_find_device() accepts fs_info as an argument and retrieves
fs_devices from fs_info.

Instead use fs_devices, so that this function can be used in non-mount
(during device scanning) context as well.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:23 +01:00
Anand Jain
6e927cebe2 btrfs: cleanup btrfs_find_device_by_devspec()
btrfs_find_device_by_devspec() finds the device by @devid or by
@device_path. This patch makes code flow easy to read by open coding the
else part and renames devpath to device_path.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:23 +01:00
Anand Jain
d95a830c78 btrfs: merge btrfs_find_device_missing_or_by_path() into parent
btrfs_find_device_missing_or_by_path() is relatively small function, and
its only parent btrfs_find_device_by_devspec() is small as well. Besides
there are a number of find_device functions. Merge
btrfs_find_device_missing_or_by_path() into its parent.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:22 +01:00
Al Viro
92900e5160 btrfs: fix potential oops in device_list_add
alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.

Fixes: d1a6300282 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-30 18:16:40 +01:00
Qu Wenruo
1b3922a8bc btrfs: Use real device structure to verify dev extent
[BUG]
Linux v5.0-rc1 will fail fstests/btrfs/163 with the following kernel
message:

  BTRFS error (device dm-6): dev extent devid 1 physical offset 13631488 len 8388608 is beyond device boundary 0
  BTRFS error (device dm-6): failed to verify dev extents against chunks: -117
  BTRFS error (device dm-6): open_ctree failed

[CAUSE]
Commit cf90d884b3 ("btrfs: Introduce mount time chunk <-> dev extent
mapping check") introduced strict check on dev extents.

We use btrfs_find_device() with dev uuid and fs uuid set to NULL, and
only dependent on @devid to find the real device.

For seed devices, we call clone_fs_devices() in open_seed_devices() to
allow us search seed devices directly.

However clone_fs_devices() just populates devices with devid and dev
uuid, without populating other essential members, like disk_total_bytes.

This makes any device returned by btrfs_find_device(fs_info, devid,
NULL, NULL) is just a dummy, with 0 disk_total_bytes, and any dev
extents on the seed device will not pass the device boundary check.

[FIX]
This patch will try to verify the device returned by btrfs_find_device()
and if it's a dummy then re-search in seed devices.

Fixes: cf90d884b3 ("btrfs: Introduce mount time chunk <-> dev extent mapping check")
CC: stable@vger.kernel.org # 4.19+
Reported-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-10 17:13:00 +01:00
Andrea Gelmini
52042d8e82 btrfs: Fix typos in comments and strings
The typos accumulate over time so once in a while time they get fixed in
a large patch.

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:50 +01:00
Nikolay Borisov
15c8276302 btrfs: Remove 1st shrink/grow phase from balance
The first step of the rebalance process ensures there is 1MiB free on
each device. This number seems rather small. And in fact when talking to
the original authors their opinions were:

"man that's a little bonkers"
"i don't think we even need that code anymore"
"I think it was there to make sure we had room for the blank 1M at the
beginning. I bet it goes all the way back to v0"
"we just don't need any of that tho, i say we just delete it"

Clearly, this piece of code has lost its original intent throughout the
years. It doesn't really bring any real practical benefits to the
relocation process.

Additionally, this patch makes the balance process more lightweight by
removing a pair of shrink/grow operations which are rather expensive for
heavily populated filesystems. This is mainly due to shrink requiring
relocating block groups, involving heavy use of the btree.

The intermediate shrink/grow can fail and leave the filesystem in a
middle state that would need to be changed back by the user.

Suggested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:49 +01:00
Johannes Thumshirn
7073017aeb btrfs: use offset_in_page instead of open-coding it
Constructs like 'var & (PAGE_SIZE - 1)' or 'var & ~PAGE_MASK' can denote an
offset into a page.

So replace them by the offset_in_page() macro instead of open-coding it if
they're not used as an alignment check.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:45 +01:00
David Sterba
cb5583dd52 btrfs: dev-replace: open code trivial locking helpers
The dev-replace locking functions are now trivial wrappers around rw
semaphore that can be used directly everywhere. No functional change.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:45 +01:00
David Sterba
53176dde0a btrfs: dev-replace: remove custom read/write blocking scheme
After the rw semaphore has been added, the custom blocking using
::blocking_readers and ::read_lock_wq is redundant.

The blocking logic in __btrfs_map_block is replaced by extending the
time the semaphore is held, that has the same blocking effect on writes
as the previous custom scheme that waited until ::blocking_readers was
zero.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:45 +01:00
Nikolay Borisov
da12fe5414 btrfs: Refactor btrfs_merge_bio_hook
This function really checks whether adding more data to the bio will
straddle a stripe/chunk. So first let's give it a more appropraite name
- btrfs_bio_fits_in_stripe. Secondly, the offset parameter was never
used to just remove it. Thirdly, pages are submitted to either btree or
data inodes so it's guaranteed that tree->ops is set so replace the
check with an ASSERT. Finally, document the parameters of the function.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:42 +01:00
Anand Jain
7333bd02dc btrfs: balance: print to system log when balance ends or is paused
Print a kernel log message when the balance ends, either for cancel or
completed or if it is paused.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:39 +01:00