linux-stable/fs/btrfs
Filipe Manana bacff03bb2 Btrfs: fix race setting up and completing qgroup rescan workers
commit 13fc1d271a upstream.

There is a race between setting up a qgroup rescan worker and completing
a qgroup rescan worker that can lead to callers of the qgroup rescan wait
ioctl to either not wait for the rescan worker to complete or to hang
forever due to missing wake ups. The following diagram shows a sequence
of steps that illustrates the race.

        CPU 1                                                         CPU 2                                  CPU 3

 btrfs_ioctl_quota_rescan()
  btrfs_qgroup_rescan()
   qgroup_rescan_init()
    mutex_lock(&fs_info->qgroup_rescan_lock)
    spin_lock(&fs_info->qgroup_lock)

    fs_info->qgroup_flags |=
      BTRFS_QGROUP_STATUS_FLAG_RESCAN

    init_completion(
      &fs_info->qgroup_rescan_completion)

    fs_info->qgroup_rescan_running = true

    mutex_unlock(&fs_info->qgroup_rescan_lock)
    spin_unlock(&fs_info->qgroup_lock)

    btrfs_init_work()
     --> starts the worker

                                                        btrfs_qgroup_rescan_worker()
                                                         mutex_lock(&fs_info->qgroup_rescan_lock)

                                                         fs_info->qgroup_flags &=
                                                           ~BTRFS_QGROUP_STATUS_FLAG_RESCAN

                                                         mutex_unlock(&fs_info->qgroup_rescan_lock)

                                                         starts transaction, updates qgroup status
                                                         item, etc

                                                                                                           btrfs_ioctl_quota_rescan()
                                                                                                            btrfs_qgroup_rescan()
                                                                                                             qgroup_rescan_init()
                                                                                                              mutex_lock(&fs_info->qgroup_rescan_lock)
                                                                                                              spin_lock(&fs_info->qgroup_lock)

                                                                                                              fs_info->qgroup_flags |=
                                                                                                                BTRFS_QGROUP_STATUS_FLAG_RESCAN

                                                                                                              init_completion(
                                                                                                                &fs_info->qgroup_rescan_completion)

                                                                                                              fs_info->qgroup_rescan_running = true

                                                                                                              mutex_unlock(&fs_info->qgroup_rescan_lock)
                                                                                                              spin_unlock(&fs_info->qgroup_lock)

                                                                                                              btrfs_init_work()
                                                                                                               --> starts another worker

                                                         mutex_lock(&fs_info->qgroup_rescan_lock)

                                                         fs_info->qgroup_rescan_running = false

                                                         mutex_unlock(&fs_info->qgroup_rescan_lock)

							 complete_all(&fs_info->qgroup_rescan_completion)

Before the rescan worker started by the task at CPU 3 completes, if
another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
fs_info->qgroup_flags, which is expected and correct behaviour.

However if other task calls btrfs_ioctl_quota_rescan_wait() before the
rescan worker started by the task at CPU 3 completes, it will return
immediately without waiting for the new rescan worker to complete,
because fs_info->qgroup_rescan_running is set to false by CPU 2.

This race is making test case btrfs/171 (from fstests) to fail often:

  btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
#      --- tests/btrfs/171.out     2018-09-16 21:30:48.505104287 +0100
#      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad      2019-09-19 02:01:36.938486039 +0100
#      @@ -1,2 +1,3 @@
#       QA output created by 171
#      +ERROR: quota rescan failed: Operation now in progress
#       Silence is golden
#      ...
#      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad'  to see the entire diff)

That is because the test calls the btrfs-progs commands "qgroup quota
rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
commands 'qgroup assign' and 'qgroup remove' often call the rescan start
ioctl after calling the qgroup assign ioctl,
btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
for a rescan worker to complete.

Another problem the race can cause is missing wake ups for waiters,
since the call to complete_all() happens outside a critical section and
after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
diagram above, if we have a waiter for the first rescan task (executed
by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
before it calls complete_all() against
fs_info->qgroup_rescan_completion, the task at CPU 3 calls
init_completion() against fs_info->qgroup_rescan_completion which
re-initilizes its wait queue to an empty queue, therefore causing the
rescan worker at CPU 2 to call complete_all() against an empty queue,
never waking up the task waiting for that rescan worker.

Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
fs_info->qgroup_rescan_running to false in the same critical section,
delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
call to complete_all() in that same critical section. This gives the
protection needed to avoid rescan wait ioctl callers not waiting for a
running rescan worker and the lost wake ups problem, since setting that
rescan flag and boolean as well as initializing the wait queue is done
already in a critical section delimited by that mutex (at
qgroup_rescan_init()).

Fixes: 57254b6ebc ("Btrfs: add ioctl to wait for qgroup rescan completion")
Fixes: d2c609b834 ("btrfs: properly track when rescan worker is running")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 13:10:09 +02:00
..
tests btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_account_extent 2018-08-06 13:12:52 +02:00
acl.c Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl 2019-03-23 20:10:00 +01:00
async-thread.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
async-thread.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
backref.c Btrfs: fix deadlock between fiemap and transaction commits 2019-08-25 10:47:54 +02:00
backref.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
btrfs_inode.h btrfs: use tagged writepage to mitigate livelock of snapshot 2019-02-12 19:47:11 +01:00
check-integrity.c btrfs: open-code bio_set_op_attrs 2018-08-06 13:12:44 +02:00
check-integrity.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
compression.c btrfs: correctly validate compression type 2019-09-16 08:22:19 +02:00
compression.h btrfs: correctly validate compression type 2019-09-16 08:22:19 +02:00
ctree.c btrfs: Relinquish CPUs in btrfs_compare_trees 2019-10-05 13:10:09 +02:00
ctree.h btrfs: fix allocation of free space cache v1 bitmap pages 2019-10-05 13:10:09 +02:00
dedupe.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
delayed-inode.c btrfs: Remove fs_info from btrfs_delete_delayed_dir_index 2018-08-06 13:13:00 +02:00
delayed-inode.h btrfs: Remove fs_info from btrfs_delete_delayed_dir_index 2018-08-06 13:13:00 +02:00
delayed-ref.c btrfs: Streamline memory allocation failure handling in btrfs_add_delayed_tree_ref 2018-08-06 13:12:39 +02:00
delayed-ref.h btrfs: Remove fs_info from btrfs_add_delayed_data_ref 2018-08-06 13:12:34 +02:00
dev-replace.c btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-07-10 09:53:44 +02:00
dev-replace.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
dir-item.c btrfs: Remove fs_info from btrfs_insert_delayed_dir_index 2018-08-06 13:13:00 +02:00
disk-io.c btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails 2019-05-22 07:37:42 +02:00
disk-io.h btrfs: Check the first key and level for cached extent buffer 2019-05-22 07:37:42 +02:00
export.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
export.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
extent-tree.c btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type 2019-10-05 13:09:59 +02:00
extent_io.c btrfs: Remove extent_io_ops::fill_delalloc 2019-09-16 08:21:59 +02:00
extent_io.h btrfs: Remove extent_io_ops::fill_delalloc 2019-09-16 08:21:59 +02:00
extent_map.c btrfs: use fs_info for btrfs_handle_em_exist tracepoint 2018-05-28 18:07:17 +02:00
extent_map.h btrfs: use fs_info for btrfs_handle_em_exist tracepoint 2018-05-28 18:07:17 +02:00
file-item.c btrfs: simplify pointer chasing of local fs_info variables 2018-08-06 13:12:43 +02:00
file.c Btrfs: add missing inode version, ctime and mtime updates when punching hole 2019-07-26 09:14:27 +02:00
free-space-cache.c btrfs: fix allocation of free space cache v1 bitmap pages 2019-10-05 13:10:09 +02:00
free-space-cache.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
free-space-tree.c btrfs: Remove fs_info from btrfs_del_root 2018-08-06 13:13:00 +02:00
free-space-tree.h btrfs: Remove fs_info argument from add_to_free_space_tree 2018-05-28 18:07:36 +02:00
inode-item.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
inode-map.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
inode-map.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
inode.c btrfs: fix allocation of free space cache v1 bitmap pages 2019-10-05 13:10:09 +02:00
ioctl.c Btrfs: do not allow trimming when a fs is mounted with the nologreplay option 2019-04-17 08:38:51 +02:00
Kconfig btrfs: add SPDX header to Kconfig 2018-04-12 16:29:55 +02:00
locking.c btrfs: replace waitqueue_actvie with cond_wake_up 2018-05-28 18:23:09 +02:00
locking.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
lzo.c btrfs: lzo: Harden inline lzo compressed extent decompression 2018-05-30 16:46:43 +02:00
Makefile btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
math.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
ordered-data.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
ordered-data.h btrfs: remove remaing full_sync logic from btrfs_sync_file 2018-08-06 13:12:31 +02:00
orphan.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
print-tree.c btrfs: annotate unlikely branches after V0 extent type removal 2018-08-06 13:12:41 +02:00
print-tree.h btrfs: print-tree: debugging output enhancement 2018-04-20 19:18:16 +02:00
props.c btrfs: correctly validate compression type 2019-09-16 08:22:19 +02:00
props.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
qgroup.c Btrfs: fix race setting up and completing qgroup rescan workers 2019-10-05 13:10:09 +02:00
qgroup.h btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled 2018-11-13 11:08:56 -08:00
raid56.c btrfs: raid56: properly unmap parity page in finish_parity_scrub() 2019-04-03 06:26:21 +02:00
raid56.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
rcu-string.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
reada.c btrfs: start readahead also in seed devices 2019-06-25 11:35:59 +08:00
ref-verify.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ref-verify.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
relocation.c btrfs: fix panic during relocation after ENOSPC before writeback happens 2019-05-31 06:46:13 -07:00
root-tree.c btrfs: Don't panic when we can't find a root key 2019-05-31 06:46:13 -07:00
scrub.c btrfs: init csum_list before possible free 2019-09-16 08:22:08 +02:00
send.c Btrfs: fix incremental send failure after deduplication 2019-08-06 19:06:53 +02:00
send.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
struct-funcs.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
super.c btrfs: On error always free subvol_name in btrfs_mount 2019-02-06 17:30:14 +01:00
sysfs.c btrfs: sysfs: don't leak memory when failing add fsid 2019-05-31 06:46:02 -07:00
sysfs.h btrfs: sysfs: Use enum/define value for feature array definitions 2018-05-28 18:23:39 +02:00
transaction.c Btrfs: fix deadlock between fiemap and transaction commits 2019-08-25 10:47:54 +02:00
transaction.h Btrfs: fix deadlock between fiemap and transaction commits 2019-08-25 10:47:54 +02:00
tree-checker.c btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable 2018-12-08 12:59:10 +01:00
tree-checker.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
tree-defrag.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
tree-log.c Btrfs: fix assertion failure during fsync and use of stale transaction 2019-09-19 09:09:33 +02:00
tree-log.h Btrfs: sync log after logging new name 2018-08-23 17:37:26 +02:00
ulist.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ulist.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
uuid-tree.c btrfs: Remove fs_info argument from btrfs_uuid_tree_rem 2018-05-30 16:46:53 +02:00
volumes.c btrfs: Use real device structure to verify dev extent 2019-09-16 08:22:01 +02:00
volumes.h btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-07-10 09:53:44 +02:00
xattr.c Btrfs: use nofs context when initializing security xattrs to avoid deadlock 2019-01-16 22:04:37 +01:00
xattr.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
zlib.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
zstd.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00