Commit graph

1031328 commits

Author SHA1 Message Date
David Sterba
d58ede8d1d btrfs: simplify data stripe calculation helpers
There are two helpers doing the same calculations based on nparity and
ncopies. calc_data_stripes can be simplified into one expression, so far
we don't have profile with both copies and parity, so there's no
effective change. calc_stripe_length should reuse the helper and not
repeat the same calculation.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:03 +02:00
David Sterba
fe4f46d40c btrfs: merge alloc_device helpers
The device allocation is split to two functions, but one just calls the
other and they're very far in the file. Merge them together.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:03 +02:00
David Sterba
500a44c9b3 btrfs: uninline btrfs_bg_flags_to_raid_index
The helper does a simple translation from block group flags to index to
the btrfs_raid_array table. There's no apparent reason to inline the
function, the translation happens usually once per function and is not
called in a loop.

Making it a proper function saves quite some binary code (x86_64,
release config):

   text    data     bss     dec     hex filename
1164011   19253   14912 1198176  124860 pre/btrfs.ko
1161559   19253   14912 1195724  123ecc post/btrfs.ko

DELTA: -2451

Also add the const attribute as there are no side effects, this could
help compiler to optimize a few things without the function body.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:03 +02:00
David Sterba
6c154ba41b btrfs: tree-checker: add missing stripe checks for raid1c3/4 profiles
The stripe checks for raid1c3/raid1c4 are missing in the sequence in
btrfs_check_chunk_valid.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:03 +02:00
David Sterba
0ac6e06b6c btrfs: tree-checker: use table values for stripe checks
There are hardcoded values in several checks regarding chunks and stripe
constraints. We have that defined in the raid table and ought to use it.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:02 +02:00
David Sterba
809d6902b3 btrfs: make btrfs_next_leaf static inline
btrfs_next_leaf is a simple wrapper for btrfs_next_old_leaf so move it
to header to avoid the function call overhead.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:02 +02:00
David Sterba
f41b6ba93d btrfs: remove uptodate parameter from btrfs_dec_test_first_ordered_pending
In commit e65f152e43 ("btrfs: refactor how we finish ordered extent io
for endio functions") there was last caller not using 1 for the uptodate
parameter. Now there's only one, passing 1, so we can remove it and
simplify the code.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:02 +02:00
David Sterba
25c1252a02 btrfs: switch uptodate to bool in btrfs_writepage_endio_finish_ordered
The uptodate parameter should be bool, change the type.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:02 +02:00
Qu Wenruo
a129ffb816 btrfs: remove unused start and end parameters from btrfs_run_delalloc_range()
Since commit d75855b451 ("btrfs: Remove
extent_io_ops::writepage_start_hook") removes the writepage_start_hook()
and adds btrfs_writepage_cow_fixup() function, there is no need to
follow the old hook parameters.

Remove the @start and @end hook, since currently the fixup check is full
page check, it doesn't need @start and @end hook.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:02 +02:00
Marcos Paulo de Souza
a7d1c5dc86 btrfs: introduce btrfs_lookup_match_dir
btrfs_search_slot is called in multiple places in dir-item.c to search
for a dir entry, and then calling btrfs_match_dir_name to return a
btrfs_dir_item.

In order to reduce the number of callers of btrfs_search_slot, create a
common function that looks for the dir key, and if found call
btrfs_match_dir_item_name.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:02 +02:00
Marcos Paulo de Souza
f8ee80de7b btrfs: remove unneeded return variable in btrfs_lookup_file_extent
We can return from btrfs_search_slot directly which also shows that it
follows the same return value convention.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Marcos Paulo de Souza
ad9a937850 btrfs: use btrfs_next_leaf instead of btrfs_next_item when slots > nritems
After calling btrfs_search_slot is a common practice to check if the
slot found isn't bigger than number of slots in the current leaf, and if
so, search for the same key in the next leaf by calling btrfs_next_leaf,
which calls btrfs_next_old_leaf to do the job.

Calling btrfs_next_item in the same situation would end up in the same
code flow, since

* btrfs_next_item
  * btrfs_next_old_item
    * if slot >= nritems(curr_leaf)
      btrfs_next_old_leaf

Change btrfs_verify_dev_extents and calculate_emulated_zone_size
functions to use btrfs_next_leaf in the same situation.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Filipe Manana
c7bcbb2120 btrfs: remove ignore_offset argument from btrfs_find_all_roots()
Currently all the callers of btrfs_find_all_roots() pass a value of false
for its ignore_offset argument. This makes the argument pointless and we
can remove it and make btrfs_find_all_roots() always pass false as the
ignore_offset argument for btrfs_find_all_roots_safe(). So just do that.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Filipe Manana
2ac691d8b3 btrfs: avoid unnecessary lock and leaf splits when updating inode in the log
During a fast fsync, if we have already fsynced the file before and in the
current transaction, we can make the inode item update more efficient and
avoid acquiring a write lock on the leaf's parent.

To update the inode item we are always using btrfs_insert_empty_item() to
get a path pointing to the inode item, which calls btrfs_search_slot()
with an "ins_len" argument of 'sizeof(struct btrfs_inode_item) +
sizeof(struct btrfs_item)', and that always results in the search taking
a write lock on the level 1 node that is the parent of the leaf that
contains the inode item. This adds unnecessary lock contention on log
trees when we have multiple fsyncs in parallel against inodes in the same
subvolume, which has a very significant impact due to the fact that log
trees are short lived and their height very rarely goes beyond level 2.

Also, by using btrfs_insert_empty_item() when we need to update the inode
item, we also end up splitting the leaf of the existing inode item when
the leaf has an amount of free space smaller than the size of an inode
item.

Improve this by using btrfs_seach_slot(), with a 0 "ins_len" argument,
when we know the inode item already exists in the log. This avoids these
two inefficiencies.

The following script, using fio, was used to perform the tests:

  $ cat fio-test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1
  MOUNT_OPTIONS="-o ssd"
  MKFS_OPTIONS="-d single -m single"

  if [ $# -ne 4 ]; then
    echo "Use $0 NUM_JOBS FILE_SIZE FSYNC_FREQ BLOCK_SIZE"
    exit 1
  fi

  NUM_JOBS=$1
  FILE_SIZE=$2
  FSYNC_FREQ=$3
  BLOCK_SIZE=$4

  cat <<EOF > /tmp/fio-job.ini
  [writers]
  rw=randwrite
  fsync=$FSYNC_FREQ
  fallocate=none
  group_reporting=1
  direct=0
  bs=$BLOCK_SIZE
  ioengine=sync
  size=$FILE_SIZE
  directory=$MNT
  numjobs=$NUM_JOBS
  EOF

  echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  echo
  echo "Using config:"
  echo
  cat /tmp/fio-job.ini
  echo
  echo "mount options: $MOUNT_OPTIONS"
  echo

  umount $MNT &> /dev/null
  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  fio /tmp/fio-job.ini
  umount $MNT

The tests were done on a physical machine, with 12 cores, 64G of RAM,
using a NVMEe device and using a non-debug kernel config (the default one
from Debian). The summary line from fio is provided below for each test
run.

With 8 jobs, file size 256M, fsync frequency of 4 and a block size of 4K:

Before: WRITE: bw=28.3MiB/s (29.7MB/s), 28.3MiB/s-28.3MiB/s (29.7MB/s-29.7MB/s), io=2048MiB (2147MB), run=72297-72297msec
After:  WRITE: bw=28.7MiB/s (30.1MB/s), 28.7MiB/s-28.7MiB/s (30.1MB/s-30.1MB/s), io=2048MiB (2147MB), run=71411-71411msec

+1.4% throughput, -1.2% runtime

With 16 jobs, file size 256M, fsync frequency of 4 and a block size of 4K:

Before: WRITE: bw=40.0MiB/s (42.0MB/s), 40.0MiB/s-40.0MiB/s (42.0MB/s-42.0MB/s), io=4096MiB (4295MB), run=99980-99980msec
After:  WRITE: bw=40.9MiB/s (42.9MB/s), 40.9MiB/s-40.9MiB/s (42.9MB/s-42.9MB/s), io=4096MiB (4295MB), run=97933-97933msec

+2.2% throughput, -2.1% runtime

The changes are small but it's possible to be better on faster hardware as
in the test machine used disk utilization was pretty much 100% during the
whole time the tests were running (observed with 'iostat -xz 1').

The tests also included the previous patch with the subject of:
"btrfs: avoid unnecessary log mutex contention when syncing log".
So they compared a branch without that patch and without this patch versus
a branch with these two patches applied.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Filipe Manana
e68107e51f btrfs: remove unnecessary list head initialization when syncing log
One of the last steps of syncing the log is to remove all log contexts
from the root's list of contexts, done at btrfs_remove_all_log_ctxs().
There we iterate over all the contexts in the list and delete each one
from the list, and after that we call INIT_LIST_HEAD() on the list. That
is unnecessary since at that point the list is empty.

So just remove the INIT_LIST_HEAD() call. It's not needed, increases code
size (bloat-o-meter reported a delta of -122 for btrfs_sync_log() after
this change) and increases two critical sections delimited by log mutexes.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Filipe Manana
e1a6d26483 btrfs: avoid unnecessary log mutex contention when syncing log
When syncing the log we acquire the root's log mutex just to update the
root's last_log_commit. This is unnecessary because:

1) At this point there can only be one task updating this value, which is
   the task committing the current log transaction. Any task that enters
   btrfs_sync_log() has to wait for the previous log transaction to commit
   and wait for the current log transaction to commit if someone else
   already started it (in this case it never reaches to the point of
   updating last_log_commit, as that is done by the committing task);

2) All readers of the root's last_log_commit don't acquire the root's
   log mutex. This is to avoid blocking the readers, potentially for too
   long and because getting a stale value of last_log_commit does not
   cause any functional problem, in the worst case getting a stale value
   results in logging an inode unnecessarily. Plus it's actually very
   rare to get a stale value that results in unnecessarily logging the
   inode.

So in order to avoid unnecessary contention on the root's log mutex,
which is used for several different purposes, like starting/joining a
log transaction and starting writeback of a log transaction, stop
acquiring the log mutex for updating the root's last_log_commit.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Filipe Manana
cceaa89f02 btrfs: remove racy and unnecessary inode transaction update when using no-holes
When using the NO_HOLES feature and expanding the size of an inode, we
update the inode's last_trans, last_sub_trans and last_log_commit fields
at maybe_insert_hole() so that a fsync does know that the inode needs to
be logged (by making sure that btrfs_inode_in_log() returns false). This
happens for expanding truncate operations, buffered writes, direct IO
writes and when cloning extents to an offset greater than the inode's
i_size.

However the way we do it is racy, because in between setting the inode's
last_sub_trans and last_log_commit fields, the log transaction ID that was
assigned to last_sub_trans might be committed before we read the root's
last_log_commit and assign that value to last_log_commit. If that happens
it would make a future call to btrfs_inode_in_log() return true. This is
a race that should be extremely unlikely to be hit in practice, and it is
the same that was described by commit bc0939fcfa ("btrfs: fix race
between marking inode needs to be logged and log syncing").

The fix would simply be to set last_log_commit to the value we assigned
to last_sub_trans minus 1, like it was done in that commit. However
updating these two fields plus the last_trans field is pointless here
because all the callers of btrfs_cont_expand() (which is the only
caller of maybe_insert_hole()) always call btrfs_set_inode_last_trans()
or btrfs_update_inode() after calling btrfs_cont_expand(). Calling either
btrfs_set_inode_last_trans() or btrfs_update_inode() guarantees that the
next fsync will log the inode, as it makes btrfs_inode_in_log() return
false.

So just remove the code that explicitly sets the inode's last_trans,
last_sub_trans and last_log_commit fields.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
Filipe Manana
5a656c3628 btrfs: stop doing GFP_KERNEL memory allocations in the ref verify tool
In commit 351cbf6e44 ("btrfs: use nofs allocations for running delayed
items") we wrapped all btree updates when running delayed items with
memalloc_nofs_save() and memalloc_nofs_restore(), due to a lock inversion
detected by lockdep involving reclaim and the mutex of delayed nodes.

The problem is because the ref verify tool does some memory allocations
with GFP_KERNEL, which can trigger reclaim and reclaim can trigger inode
eviction, which requires locking the mutex of an inode's delayed node.
On the other hand the ref verify tool is called when allocating metadata
extents as part of operations that modify a btree, which is a problem when
running delayed nodes, where we do btree updates while holding the mutex
of a delayed node. This is what caused the lockdep warning.

Instead of wrapping every btree update when running delayed nodes, change
the ref verify tool to never do GFP_KERNEL allocations, because:

1) We get less repeated code, which at the moment does not even have a
   comment mentioning why we need to setup the NOFS context, which is a
   recommended good practice as mentioned at
   Documentation/core-api/gfp_mask-from-fs-io.rst

2) The ref verify tool is something meant only for debugging and not
   something that should be enabled on non-debug / non-development
   kernels;

3) We may have yet more places outside delayed-inode.c where we have
   similar problem: doing btree updates while holding some lock and
   then having the GFP_KERNEL memory allocations, from the ref verify
   tool, trigger reclaim and trying again to acquire the same lock
   through the reclaim path.
   Or we could get more such cases in the future, therefore this change
   prevents getting into similar cases when using the ref verify tool.

Curiously most of the memory allocations done by the ref verify tool
were already using GFP_NOFS, except a few ones for no apparent reason.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
Filipe Manana
506650dcb3 btrfs: improve the batch insertion of delayed items
When we insert the delayed items of an inode, which corresponds to the
directory index keys for a directory (key type BTRFS_DIR_INDEX_KEY), we
do the following:

1) Pick the first delayed item from the rbtree and insert it into the
   fs/subvolume btree, using btrfs_insert_empty_item() for that;

2) Without releasing the path returned by btrfs_insert_empty_item(),
   keep collecting as many consecutive delayed items from the rbtree
   as possible, as long as each one's BTRFS_DIR_INDEX_KEY key is the
   immediate successor of the previously picked item and as long as
   they fit in the available space of the leaf the path points to;

3) Then insert all the collected items into the leaf;

4) Release the reserve metadata space for each collected item and
   release each item (implies deleting from the rbtree);

5) Unlock the path.

While this is much better than inserting items one by one, it can be
improved in a few aspects:

1) Instead of adding items based on the remaining free space of the
   leaf, collect as many items that can fit in a leaf and bulk insert
   them. This results in less and larger batches, reducing the total
   amount of time to insert the delayed items. For example when adding
   100K files to a directory, we ended up creating 1658 batches with
   very variable sizes ranging from 1 item to 118 items, on a filesystem
   with a node/leaf size of 16K. After this change, we end up with 839
   batches, with the vast majority of them having exactly 120 items;

2) We do the search for more items to batch, by iterating the rbtree,
   while holding a write lock on the leaf;

3) While still holding the leaf locked, we are releasing the reserved
   metadata for each item and then deleting each item, keeping a write
   lock on the leaf for longer than necessary. Releasing the delayed items
   one by one can take a significant amount of time, because deleting
   them from the rbtree can often be a bit slow when the deletion results
   in rebalancing the rbtree.

So change this so that we try to create larger batches, with a total
item size up to the maximum a leaf can support, and by unlocking the leaf
immediately after inserting the items, releasing the reserved metadata
space of each item and releasing each item without holding the write lock
on the leaf.

The following script that runs fs_mark was used to test this change:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1
  MOUNT_OPTIONS="-o ssd"
  MKFS_OPTIONS="-m single -d single"
  FILES=1000000
  THREADS=16
  FILE_SIZE=0

  echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  umount $DEV &> /dev/null
  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  OPTS="-S 0 -L 5 -n $FILES -s $FILE_SIZE -t 16"
  for ((i = 1; i <= $THREADS; i++)); do
      OPTS="$OPTS -d $MNT/d$i"
  done

  fs_mark $OPTS

  umount $MNT

It was run on machine with 12 cores, 64G of ram, using a NVMe device and
using a non-debug kernel config (Debian's default config).

Results before this change:

FSUse%        Count         Size    Files/sec         App Overhead
     1     16000000            0      76182.1             72223046
     3     32000000            0      62746.9             80776528
     5     48000000            0      77029.0             93022381
     6     64000000            0      73691.6             95251075
     8     80000000            0      66288.0             85089634

Results after this change:

FSUse%        Count         Size    Files/sec         App Overhead
     1     16000000            0      79049.5 (+3.7%)     69700824
     3     32000000            0      65248.9 (+3.9%)     80583693
     5     48000000            0      77991.4 (+1.2%)     90040908
     6     64000000            0      75096.8 (+1.9%)     89862241
     8     80000000            0      66926.8 (+1.0%)     84429169

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
Qu Wenruo
2b29726c47 btrfs: rescue: allow ibadroots to skip bad extent tree when reading block group items
When extent tree gets corrupted, normally it's not extent tree root, but
one toasted tree leaf/node.

In that case, rescue=ibadroots mount option won't help as it can only
handle the extent tree root corruption.

This patch will enhance the behavior by:

- Allow fill_dummy_bgs() to ignore -EEXIST error

  This means we may have some block group items read from disk, but
  then hit some error halfway.

- Fallback to fill_dummy_bgs() if any error gets hit in
  btrfs_read_block_groups()

  Of course, this still needs rescue=ibadroots mount option.

With that, rescue=ibadroots can handle extent tree corruption more
gracefully and allow a better recover chance.

Reported-by: Zhenyu Wu <wuzy001@gmail.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg114424.html
Reviewed-by: Su Yue <l@damenly.su>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
Marcos Paulo de Souza
6534c0c99d btrfs: pass NULL as trans to btrfs_search_slot if we only want to search
Using a transaction in btrfs_search_slot is only useful when we are
searching to add or modify the tree. When the function is used for
searching, insert length and mod arguments are 0, there is no need to
use a transaction.

No functional changes, changing for consistency.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
Filipe Manana
069a2e3778 btrfs: continue readahead of siblings even if target node is in memory
At reada_for_search(), when attempting to readahead a node or leaf's
siblings, we skip the readahead of the siblings if the node/leaf is
already in memory. That is probably fine for the READA_FORWARD and
READA_BACK readahead types, as they are used on contexts where we
end up reading some consecutive leaves, but usually not the whole btree.

However for a READA_FORWARD_ALWAYS mode, currently only used for full
send operations, it does not make sense to skip the readahead if the
target node or leaf is already loaded in memory, since we know the caller
is visiting every node and leaf of the btree in ascending order.

So change the behaviour to not skip the readahead when the target node is
already in memory and the readahead mode is READA_FORWARD_ALWAYS.

The following test script was used to measure the improvement on a box
using an average, consumer grade, spinning disk, with 32GiB of RAM and
using a non-debug kernel config (Debian's default config).

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj
  MKFS_OPTIONS="--nodesize 16384"     # default, just to be explicit
  MOUNT_OPTIONS="-o max_inline=2048"  # default, just to be explicit

  mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null
  mount $MOUNT_OPTIONS $DEV $MNT

  # Create files with inline data to make it easier and faster to create
  # large btrees.
  add_files()
  {
      local total=$1
      local start_offset=$2
      local number_jobs=$3
      local total_per_job=$(($total / $number_jobs))

      echo "Creating $total new files using $number_jobs jobs"
      for ((n = 0; n < $number_jobs; n++)); do
          (
              local start_num=$(($start_offset + $n * $total_per_job))
              for ((i = 1; i <= $total_per_job; i++)); do
                  local file_num=$((start_num + $i))
                  local file_path="$MNT/file_${file_num}"
                  xfs_io -f -c "pwrite -S 0xab 0 2000" $file_path > /dev/null
                  if [ $? -ne 0 ]; then
                      echo "Failed creating file $file_path"
                      break
                  fi
              done
          ) &
          worker_pids[$n]=$!
      done

      wait ${worker_pids[@]}

      sync
      echo
      echo "btree node/leaf count: $(btrfs inspect-internal dump-tree -t 5 $DEV | egrep '^(node|leaf) ' | wc -l)"
  }

  file_count=2000000
  add_files $file_count 0 4

  echo
  echo "Creating snapshot..."
  btrfs subvolume snapshot -r $MNT $MNT/snap1

  umount $MNT

  echo 3 > /proc/sys/vm/drop_caches
  blockdev --flushbufs $DEV &> /dev/null
  hdparm -F $DEV &> /dev/null

  mount $MOUNT_OPTIONS $DEV $MNT

  echo
  echo "Testing full send..."
  start=$(date +%s)
  btrfs send $MNT/snap1 > /dev/null
  end=$(date +%s)
  echo
  echo "Full send took $((end - start)) seconds"

  umount $MNT

The duration of the full send operations, in seconds, were the following:

Before this change:  85 seconds
After this change:   76 seconds (-11.2%)

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
David Sterba
5da3847992 btrfs: check-integrity: drop kmap/kunmap for block pages
The pages in block_ctx have never been allocated from highmem (in
btrfsic_read_block) so the mapping is pointless and can be removed.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
David Sterba
4c2bf276b5 btrfs: compression: drop kmap/kunmap from generic helpers
The pages in compressed_pages are not from highmem anymore so we can
drop the mapping for checksum calculation and inline extent.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:00 +02:00
David Sterba
bbaf9715f3 btrfs: compression: drop kmap/kunmap from zstd
As we don't use highmem pages anymore, drop the kmap/kunmap. The kmap is
simply page_address and kunmap is a no-op.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
David Sterba
696ab562e6 btrfs: compression: drop kmap/kunmap from zlib
As we don't use highmem pages anymore, drop the kmap/kunmap. The kmap is
simply page_address and kunmap is a no-op.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
David Sterba
8c945d32e6 btrfs: compression: drop kmap/kunmap from lzo
As we don't use highmem pages anymore, drop the kmap/kunmap. The kmap is
simply page_address and kunmap is a no-op.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
David Sterba
b0ee5e1ec4 btrfs: drop from __GFP_HIGHMEM all allocations
The highmem flag is used for allocating pages for compression and for
raid56 pages. The high memory makes sense on 32bit systems but is not
without problems. On 64bit system's it's just another layer of wrappers.

The time the pages are allocated for compression or raid56 is relatively
short (about a transaction commit), so the pages are not blocked
indefinitely. As the number of pages depends on the amount of data being
written/read, there's a theoretical problem. A fast device on a 32bit
system could use most of the low memory pool, while with the highmem
allocation that would not happen. This was possibly the original idea
long time ago, but nowadays we optimize for 64bit systems.

This patch removes all usage of the __GFP_HIGHMEM flag for page
allocation, the kmap/kunmap are still in place and will be removed in
followup patches. Remaining is masking out the bit in
alloc_extent_state and __lookup_free_space_inode, that can safely stay.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
Anand Jain
23608d51a3 btrfs: cleanup fs_devices pointer usage in btrfs_trim_fs
Drop variable 'devices' (used only once) and add new variable for
the fs_devices, so it is used at two locations within btrfs_trim_fs()
function and also helps to access fs_devices->devices.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
Marcos Paulo de Souza
67d5e289a1 btrfs: remove max argument from generic_bin_search
Both callers use btrfs_header_nritems to feed the max argument.  Remove
the argument and let generic_bin_search call it itself.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
Nikolay Borisov
2eadb9e75e btrfs: make btrfs_finish_chunk_alloc private to block-group.c
One of the final things that must be done to add a new chunk is
inserting its device extent items in the device tree. They describe
the portion of allocated device physical space during phase 1 of
chunk allocation. This is currently done in btrfs_finish_chunk_alloc
whose name isn't very informative. What's more, this function is only
used in block-group.c but is defined as public. There isn't anything
special about it that would warrant it being defined in volumes.c.

Just move btrfs_finish_chunk_alloc and alloc_chunk_dev_extent to
block-group.c, make the former static and rename both functions to
insert_dev_extents and insert_dev_extent respectively.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:59 +02:00
Anand Jain
4a9531cf89 btrfs: check-integrity: drop unnecessary function prototypes
The function prototypes below aren't necessary as the functions are
first defined before called. Remove them.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:58 +02:00
David Sterba
b3b7e1d0b4 btrfs: add special case to setget helpers for 64k pages
On 64K pages the size of the extent_buffer::pages array is 1 and
compilation with -Warray-bounds warns due to

  kaddr = page_address(eb->pages[idx + 1]);

when reading byte range crossing page boundary.

This does never actually overflow the array because on 64K because all
the data fit in one page and bounds are checked by check_setget_bounds.

To fix the reported overflows and warnings add a compile-time condition
that will allow compiler to eliminate the dead code that reads from the
idx + 1 page.

Link: https://lore.kernel.org/lkml/20210623083901.1d49d19d@canb.auug.org.au/
CC: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:58 +02:00
Johannes Thumshirn
5a80d1c6a2 btrfs: zoned: remove max_zone_append_size logic
There used to be a patch in the original series for zoned support which
limited the extent size to max_zone_append_size, but this patch has been
dropped somewhere around v9.

We've decided to go the opposite direction, instead of limiting extents
in the first place we split them before submission to comply with the
device's limits.

Remove the related code, btrfs_fs_info::max_zone_append_size and
btrfs_zoned_device_info::max_zone_append_size.

This also removes the workaround for dm-crypt introduced in
1d68128c10 ("btrfs: zoned: fail mount if the device does not support
zone append") because the fix has been merged as f34ee1dce6 ("dm
crypt: Fix zoned block device support").

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:18:58 +02:00
Linus Torvalds
e22ce8eb63 Linux 5.14-rc7 2021-08-22 14:24:56 -07:00
Linus Torvalds
1bdc3d5be7 powerpc fixes for 5.14 #6
- Fix random crashes on some 32-bit CPUs by adding isync() after locking/unlocking KUEP
  - Fix intermittent crashes when loading modules with strict module RWX
  - Fix a section mismatch introduce by a previous fix.
 
 Thanks to: Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo Opsfelder Araújo,
 Nathan Chancellor, Stan Johnson.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEhjVUTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM/VD/4o5D9d2Xppt+T0zdnKomq+3ffkC33/
 zBK4vqOVOXbnlRpChqIqsHB3LNxMNMvTVaoLvxgy3ZQ57+rnirSDaFOaj4Nbazdx
 STwWmyxW9xPshqvj8tz8uHadSkvbrCClFy59FXtJf4H/iztTnQORKnXI9r3wxXS+
 wBhw8Nhquuqg4O5h4q6yLLRIAaskus7uymDzYHVZkHO0RhfPLEZJwfCxydc29ukK
 wIRB6qojFBbWm/UscY1w6FiYrBn4Y5F3DzoTzJ7xlO6l1NYaE+58aun/oTGU7922
 /8fXYs34TnkF6sA9qGhOtOc1MXfH8meFoH9s/fY3Z3O88xTe8k15wo2Ujlk/u0X1
 1Gzv9FZI0RnpPtSLPiiu72/zS/vFOxAVCFMTvcodlte9RN90fW5Qwt/O1ya22vWt
 Ea3O9iNmYgQ+lV7ZZYDtKQ22WHIublg6cY5d3NDyj5HrzN/vGyp3QJFb2dnWoEpx
 k/KkK16oiIlduLGiFoYjn1ELyHUBTvp483y7zmspA4fCb0ue6W8b2zt8FszH0hI7
 N4uroGXuk9OyhNsLWR8UHUR0s6Gi0XSaQ0O4XgWfoDAAvdev4oZiCqw0q5552OvX
 eE/Ogxc7INCiaoeLwOhYhCKjr+jBP8fhqyQzquyqMgUqEbxLtcFZCJ09bpXHjjiH
 OlAvZwlzOhwcKg==
 =K2B0
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix random crashes on some 32-bit CPUs by adding isync() after
   locking/unlocking KUEP

 - Fix intermittent crashes when loading modules with strict module RWX

 - Fix a section mismatch introduce by a previous fix.

Thanks to Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo
Opsfelder Araújo, Nathan Chancellor, and Stan Johnson.

h# -----BEGIN PGP SIGNATURE-----

* tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fix set_memory_*() against concurrent accesses
  powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
  powerpc/xive: Do not mark xive_request_ipi() as __init
2021-08-22 09:49:31 -07:00
Linus Torvalds
9ff50bf2f2 Two clk driver fixes
- Make the regulator state match the GDSC power domain state at boot
    on Qualcomm SoCs so that the regulator isn't turned off
    inadvertently.
 
  - Fix earlycon on i.MX6Q SoCs
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmEhQMcRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSX+8RAA1oxdmw6UzJPh0QgH89oV+B8/SAPF4+so
 h8froa/uvy/xDzhtd326umxEQmu4JVnwtXepSQEvSbBiKl+Llz9ODCjQkMI3RoXd
 1QbpCtb4TFWKk0pYZeV5XqhWHWXgzkT3dsFU6buWD2SscGbenWAXm3dDScanYulb
 7XPXVXvNoTc+ekChzexUsvMnDVlwIBcDTMkiQ0N4penOMJWKFS40zfPbZwUpJZC1
 M/V2Ua7cvDLVWKqnRBm2xlnhn1j23xKVWLtElJYq75Ea86zihZbiq8bri22fgnNY
 /K0H3IQKtoMkRq7JhYlWtqX9tVFUP+ia2KBlW/lXk0W4XWvTmu6zWp8BBPGWJXDx
 G+Os+Dzj351RiKmIRRVMFS8mUCw5Sjs5T6YHwR4yItxp4syUTcPCHiPAJ/en6Y9U
 alC+6F2EEe8rVgQT3O0UTIGwHSXs8nssl2A56IGoIgB8eO/GkjnqtuESBGsPJXZP
 SfOKSFKwr9sSjw5lhyvO6K/k+VfpIbxI3eFU1K9MbOTVW/1R7anbZ2REtO/ncLu7
 nazLlsWYos/XodRNjdKRuDmgQKwr63Dq/AHaKeF/nJxLRLG8kf3b2U3ij9A9LvfN
 WUP5gi8UsFuzbpwUVe2MCT5ro9M607kDydatfnvfLtllOov6JgWMSJhQPs/NIWrd
 3rGb3K1Nmw0=
 =EGjz
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk driver fixes from Stephen Boyd:

 - Make the regulator state match the GDSC power domain state at boot on
   Qualcomm SoCs so that the regulator isn't turned off inadvertently.

 - Fix earlycon on i.MX6Q SoCs

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: gdsc: Ensure regulator init state matches GDSC state
  clk: imx6q: fix uart earlycon unwork
2021-08-21 11:27:16 -07:00
Linus Torvalds
9085423f0e Char/Misc driver fixes for 5.14-rc7
Here are some small driver fixes for 5.14-rc7.
 
 They consist of:
 	- revert for an interconnect patch that was found to have
 	  problems.
 	- ipack tpci200 driver fixes for reported problems
 	- slimbus messaging and ngd fixes for reported problems.
 
 All are small and have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYSEfhw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylxAQCfYvG2nCMkIvBi7MXNwRS0go5YJyAAn3RprTCU
 V7dduATGH8zJ4u6AOdtg
 =Uq3v
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small driver fixes for 5.14-rc7.

  They consist of:

   - revert for an interconnect patch that was found to have problems

   - ipack tpci200 driver fixes for reported problems

   - slimbus messaging and ngd fixes for reported problems

  All are small and have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ipack: tpci200: fix memory leak in the tpci200_register
  ipack: tpci200: fix many double free issues in tpci200_pci_probe
  slimbus: ngd: reset dma setup during runtime pm
  slimbus: ngd: set correct device for pm
  slimbus: messaging: check for valid transaction id
  slimbus: messaging: start transaction ids from 1 instead of zero
  Revert "interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate"
2021-08-21 11:22:10 -07:00
Linus Torvalds
f4ff9e6b01 USB fix for 5.14-rc7
Here is a single USB typec tcpm fix for a reported problem for 5.14-rc7.
 It showed up in 5.13 and resolves an issue that Hans found.  It has been
 in linux-next this week with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYSEgIQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymZpQCfeD9XR0ExAJgn7E0sNuLaOi2BxmUAn3Eu2IFu
 dFcSEpQtg23mY6aBFWfH
 =7kfD
 -----END PGP SIGNATURE-----

Merge tag 'usb-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fix from Greg KH:
 "Here is a single USB typec tcpm fix for a reported problem for
  5.14-rc7. It showed up in 5.13 and resolves an issue that Hans found.
  It has been in linux-next this week with no reported problems"

* tag 'usb-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: Fix VDMs sometimes not being forwarded to alt-mode drivers
2021-08-21 11:10:06 -07:00
Linus Torvalds
a09434f181 RISC-V Fixes for 5.14-rc7
* A fix to the sifive-l2-cache device tree bindings, for json-schema
   compatibility.  This does not change the intended behavior of the
   binding.
 * A fix to avoid improperly freeing necessary resources during early
   boot.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmEhCRsTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYid/tEACWj6O8ZK0bOCy4MESRk6YVkYep/xw8
 tTSmfRn23EF2+j/3cdJ2a2s/FWgy/FcpygdtBo/WP0EOs9HLAlCPYiN3iVmsEpge
 1QGXPCJn0w0lS0CVNaMTqABHnFNgQYwDbMrbonveDYW+UAJbrRfXB9nKy4HX//UC
 GsXH/zk7STplwiBfXsHY6lzWJakI3DTlToc2XthwLLSeE8q1bcEtLobtCcnR24l8
 pAw3lye7YgEFlcTB+Ud6BAWDlkLT8mf75wVLxFxxlwbRb2yFCwtcONtHiozEEs6O
 qVgH2ZoftBTPefB3KvcRMrtYv6QjTaDxw3zGRMwyfYwWYEtnmzvDqU+dacRMTcWj
 NbmPOTMTqkQl+MZOsMYgfKg4VRt/E+n2otssjix3IQGNXGEo01OgeNMWPOmRJCPs
 J2iKRIDyBFIV2fUPDuyV5r7mfJMlNKpLYfl9pMr9xP/UgIJyLuqs05vIVX5GdGwh
 LiAlgS8H9ywNjpu6deIs+68cQfQjKFw2tGsM7SvG4nmIF9NKxy5QkwGOwRvHSMrA
 ZnE23Q1y75wr8dOkTbA0SvPpRfebAO587L1v1qCzxvUXnGvX+Z+4QnpRywq6UaE5
 FTH4tLtlUA9pL9Yfo4nX2vAaTp8QwJ41nq4PRxJ1bWJYGCPuk5NsPcGslGsUoCZZ
 1tVOklyOJBvQHA==
 =xEwN
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - fix the sifive-l2-cache device tree bindings for json-schema
   compatibility. This does not change the intended behavior of the
   binding.

 - avoid improperly freeing necessary resources during early boot.

* tag 'riscv-for-linus-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix a number of free'd resources in init_resources()
  dt-bindings: sifive-l2-cache: Fix 'select' matching
2021-08-21 11:04:26 -07:00
Linus Torvalds
5479a7fe89 s390 updates for 5.14-rc7
- fix use after free of zpci_dev in pci code
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmEgv1cACgkQjYWKoQLX
 FBgJ/gf+PXb5Eanf0xz/9u1q2c3wf1kVAbhr1vEiPQK3JCNyyN0ZDPq2IV27nipz
 ivGc8kSFlu+G4/otNfbcWr8hswuLUvplV/E5xwTFCy//tIo7vMQDKGJ7Wkjy2Eql
 agmUg6umZNL4ErA4CORQQpkW9S0PDmK0BDjbDM93EGiFskt1vu28asBOe3ulHQ6n
 qH1ovvLLicB6wz5fJ1Ie3HRT+sQiyMBIIXBnSSt8eLqN4Wc+Nmv1J9I6+dLBWWVf
 N3V4xAZBmJQKHKkMw0WRXs+pnT8H2X92x6sjuRLqU/PeVLQuw4opY6Rsmhm08iJC
 boGZVuIVljed7d5xy3FTBrbzJ27beQ==
 =nwyD
 -----END PGP SIGNATURE-----

Merge tag 's390-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Vasily Gorbik:

 - fix use after free of zpci_dev in pci code

* tag 's390-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: fix use after free of zpci_dev
2021-08-21 10:56:06 -07:00
Linus Torvalds
15517c724c File locking change for v5.14
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmEg5JwTHGpsYXl0b25A
 a2VybmVsLm9yZwAKCRAADmhBGVaCFfX3D/94AsPqwbnIyLgP2KjoltTKtnSsOZba
 tzTKwrA7vPB9b0aFg0i42omwRtGQk5jPaB8J+tBx+oc/JjO+Dpj5FOXOI9PjFr//
 2zXBuMdvYWaPlsl9nw5dFzhybdPnv+HgNXA8PRqy/DifU7/oLWxHzQ4sGvr1XXNc
 vow5KzLe1K4R+hQBwqKLh/9VVE4+foVTAwXWcFBi/RrDf5X97BF7s98JYsNxfpbn
 EYSyLr818TCUlivAUzx+y0JD+qUknZLuNYWZ2HkYvI29y5h1gUCVmn1orlCDdDq/
 G3SCF8M66l4xGW+bpzrEabqIDl+0IWhX+grU+UVdiMH4YuXmY4HPdNgKYIpJolyy
 zdg8cCO3OAHTJiwkj5jSfxFHQnzPr58LSQGDfrIrNjcDKlQbx3c+8R5yMy7Ar7jZ
 XAM6h88PAuBknTDWBQogZtuSqKbV8D2LsAUVgRKA7iKSwYXXUmZdW+UDiOavsHmg
 n2fbi1sPP7woQKyXzFddwxG3+2Nzby8BE7xuyHTXdOrQJNE1PvICH+WVo2m8+FCe
 uLKLXNf4UECO2MaSyd6k90v3AVty4u1EDTbitgcERztWGzazZdTVlmRdpwy35V+B
 wskKALzjGgqFoSwkEAnXQdk4+Gk9EtnrXD+HYfmxGYKk87fOBCgJtqDWYw3RHK7n
 giBfvji9HK3R8A==
 =xZLO
 -----END PGP SIGNATURE-----

Merge tag 'locks-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull mandatory file locking deprecation warning from Jeff Layton:
 "As discussed on the list, this patch just adds a new warning for folks
  who still have mandatory locking enabled and actually mount with '-o
  mand'. I'd like to get this in for v5.14 so we can push this out into
  stable kernels and hopefully reach folks who have mounts with -o mand.

  For now, I'm operating under the assumption that we'll fully remove
  this support in v5.15, but we can move that out if any legitimate
  users of this facility speak up between now and then"

* tag 'locks-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fs: warn about impending deprecation of mandatory locks
2021-08-21 10:50:22 -07:00
Linus Torvalds
002c0aef10 block-5.14-2021-08-20
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEgaYcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppcLD/963Ld4YWLi1Chq6e2iqnUmuxPIgQbWCCD+
 RKNf3PRLwjVLsBKLjKHhp9fB9XhpmlXkxkLcYech9D2lFpnVf1tc0ziGCIpsNuGG
 X2UO8jfE6/XJ+laCyjTkoMlj2zWBJXSwwKx6JyDDOobYLiuVDUHeAcGVvLY/itvx
 tEtd+lXmz7cE41Q4cdoeJdmSOE54BAP5uCO66La5bv2r7xQN/nPWi+yg5UTV3GJB
 JuL+8RHyV3d4eiBF9Jg0izdp9vaUxUD3VmOjILmaG2wQy+Pbve9mMCZtTFMvSBcR
 Vw9B/fbNVon7YqOsrSCdIsfW066MqnIj55nRRETN6LxTGuzx6lQpJPSRXSDGKkR5
 SSckLXPKUcRPaX4Lc/SvgQpzvxhY3b9z3BRrIlxy8DWcZT7qq/bb41O9J4z6+jUn
 XIjKzvADLGqUqS/5zowyk/3vFHGnyhjYsRqMmpLCbjjxi5fSBbR+yorm5Vlx8auJ
 7iWHuNCGyUY/rMB1pibYhvT1dnNR6qOm/jTdHwjsb/QPDuCoU06TFnXbuSoefJlf
 ijfuwKQLgxkikICLHQ0uUHSuGhz1A8CwAZjz4rmTBSiFgQeM09v/pf4r+ymxrSgU
 n+yb4DAECIsyK8he3ePahFeJgsb0JMmz3ciSJJkK3im69jdLd28Xp4Vs0tgKmz8e
 2hMHgkXFSQ==
 =DGKg
 -----END PGP SIGNATURE-----

Merge tag 'block-5.14-2021-08-20' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Three fixes from Ming Lei that should go into 5.14:

   - Fix for a kernel panic when iterating over tags for some cases
     where a flush request is present, a regression in this cycle.

   - Request timeout fix

   - Fix flush request checking"

* tag 'block-5.14-2021-08-20' of git://git.kernel.dk/linux-block:
  blk-mq: fix is_flush_rq
  blk-mq: fix kernel panic during iterating over flush request
  blk-mq: don't grab rq's refcount in blk_mq_check_expired()
2021-08-21 08:11:22 -07:00
Linus Torvalds
1e6907d58c io_uring-5.14-2021-08-20
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEgaZ8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmFCEACzSqGGsFj7kxxxX45n8r9+3cnfSN2HRnBe
 rcKEEypK7ZyohfIfCQ+Qxhi6NCkYYtV3Enr7ErKUAbe8XpO0Eza84h0bjEBozSUa
 ZxyDVOVPHF4r0P6pGmYKNNijlPImx7ORL8kAfGTDLi0f3Qc1Chmkej1K9g5SBjIz
 qxzvv+I6L2OIokZwjmw/zCOD9GMvFtfPbL4xG1MEdXxH+aXtitkcnGlh7ZWxfFnz
 Mw4/QiyBOsya2clhvqT0FckED63wy5h0es/R0WMleIYQbSeORF2YZGejd7LoaeQj
 5PMvTeqv1f85lBo9BuUxp23/KDdwy0MOaKktOMFGWOkAvRH0ezjebNtUr6yBtXnt
 UyD/Wpdhpo0x6ZcpqB7UnwADKgd41GjPfQKqH/zNZta1vhfldg0mzuN6wfQpumg6
 j+fp/7uY/lwo5fpOF4CLwVrQEENi4Yir0aA4M6s53UknmrO26H0qHRM/uiHhmRVr
 gG5s5ZwJgabepb9VP7fCQ7NLMpfiijTPFEUahBgB5QTtVomUhsWY/XyODoGkF0PS
 frKe6+0IakmkX+fzNV3vTzchew8ySesQacU6y1ugl0cPqL9si1uqzYdv6+bOgry4
 NKlsRAqeYQAivQSdxdAE96+zIWnwJqQ6GKyvb/9XgPACGSWkzm7s5yXdXWKLwoBH
 FSPqTXs6Tw==
 =/PEi
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.14-2021-08-20' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few small fixes that should go into this release:

   - Fix never re-assigning an initial error value for io_uring_enter()
     for SQPOLL, if asked to do nothing

   - Fix xa_alloc_cycle() return value checking, for cases where we have
     wrapped around

   - Fix for a ctx pin issue introduced in this cycle (Pavel)"

* tag 'io_uring-5.14-2021-08-20' of git://git.kernel.dk/linux-block:
  io_uring: fix xa_alloc_cycle() error return value check
  io_uring: pin ctx on fallback execution
  io_uring: only assign io_uring_enter() SQPOLL error in actual error case
2021-08-21 08:06:26 -07:00
Jeff Layton
fdd92b64d1 fs: warn about impending deprecation of mandatory locks
We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
have disabled it. Warn the stragglers that still use "-o mand" that
we'll be dropping support for that mount option.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-08-21 07:32:45 -04:00
Jens Axboe
a30f895ad3 io_uring: fix xa_alloc_cycle() error return value check
We currently check for ret != 0 to indicate error, but '1' is a valid
return and just indicates that the allocation succeeded with a wrap.
Correct the check to be for < 0, like it was before the xarray
conversion.

Cc: stable@vger.kernel.org
Fixes: 61cf93700f ("io_uring: Convert personality_idr to XArray")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-20 14:59:58 -06:00
Linus Torvalds
fa54d366a6 ACPI fixes for 5.14-rc7
- Prevent confusing messages from being printed if the PRMT table
    is not present or there are no PRM modules (Aubrey Li).
 
  - Fix the handling of suspend-to-idle entry and exit in the case
    when the Microsoft UUID is used with the Low-Power S0 Idle _DSM
    interface (Mario Limonciello).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmEgAlESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxqtUP/17Ij8yvXGV2INb2S744FyLgvjk0FdgZ
 xcRhqEdYdPVqaZ75myTo+pSl7uArusO5HrKG4nRx7ATWovpb6+xK9H+ZmxyZfDkx
 Y1tb4kM3ms17JOFs8KGFtcQA9V+qJZPK7JRXGaVw9rh1B7fAf/8T/gKH2/9Yh2hn
 F0QrZUo0ihvcRBGeznoT/LxHy4hs5NEaEKf4AUcwdkOFPl/T+HspUESXY9SgVK4p
 jKdumPe9ws/GukiLsRFNMPCj73W4Ejab7E4JqCWfvFyAkDih0Ud9MPNpdCb2nxhl
 XJb1TGkXuW7PR2K+lgJBWTgRUmf4K/AbH3v7dR0B33PQfQru+BP9X6x8dDBn47Uu
 Etpoxoh8is+AQ66x3yc8P4dWH297/zF9XkYGr5S9y73QJFs9+UKDTpZNIEKrRllF
 sHCAueZmj45f3vjJeoNBzbOg6cu/TPNqPizGUphQU4t6SMEbCMU8sAlq12lLCM+4
 doQDq7nS4x2hYpDSsR8m+61TNdzclsa64L2AZGUuwzFht2EqHt+aKpzv3cVS3rES
 M8Q1w3ViTmOEKp4SlfRNaFh0NUDT7/kH+i7UzwFMqvny/CVbW0LX50RCois+bYMQ
 3wUUCOMwGgmdIe+/LkiFV+deKcETR2bsFPbIdmvzjqCU5AhQRS8ihwKeZk5ozjnM
 Ae70TtKyM1R7
 =GBtH
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix two mistakes in new code.

  Specifics:

   - Prevent confusing messages from being printed if the PRMT table is
     not present or there are no PRM modules (Aubrey Li).

   - Fix the handling of suspend-to-idle entry and exit in the case when
     the Microsoft UUID is used with the Low-Power S0 Idle _DSM
     interface (Mario Limonciello)"

* tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
  ACPI: PRM: Deal with table not present or no module found
2021-08-20 13:44:25 -07:00
Linus Torvalds
cae6876458 Power management fixes for 5.14-rc7
- Fix unuseful WARN() in the OPP core and prevent a noisy warning
    from being printed by OPP _put functions (Dmitry Osipenko).
 
  - Fix error path when allocation failed in the arm_scmi cpufreq
    driver (Lukasz Luba).
 
  - Blacklist Qualcomm sc8180x and Qualcomm sm8150 in
    cpufreq-dt-platdev (Bjorn Andersson, Thara Gopinath).
 
  - Forbid cpufreq for 1.2 GHz variant in the armada-37xx cpufreq
    driver (Marek Behún).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmEgAqQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxbG8P/ioTb6ZpoE+/Nx1QqnhJ0eOBoOE0ynjc
 2llwknuFox4ZyOau6d+WZgxzs4Ka3XEwhhO4Ki0qjMMeag+jmyqZAPAAVB4ED1g1
 dHyWOsNA2Bf/ONs1iK60WBlxU9FnWeQoqqEyywUjL0XEnsYqSnaEHR1gzv5zy2Cn
 pJiE4db6jFj1OXsOZGCJSptZsoGoCaPyiv2MvpVTjyuCz5fV3gWJnvEE5LslvNMP
 l3TMGSKtnbTNCOCpkuOVPObXaNGzWXp3RNoGRyp8r7VbQfbVu1Oqhv3EbaeGVW1X
 vmWr+qC/zbYEE/JTvdb8riBO50UuYK570yo39IQIRTqyjK/DfB/E7FTq+amQsA8y
 GtwEiBCZGhcJWn7B5xLjREyPDnuPLiG5x71MTP/gdMmx5DRkp/3eB6c6e11xapLH
 fLeuV4cEt38SbAyuznj5cr0lkzu/E7tqQyt9Ol2r9hyoW5N3G1ubAh7nx+HOEw3Z
 2/29knqM9XXbYw/DTv+9EL5KfOPPd9+NRC4ZycNhbgDov7y88HfJdfMVZRwsaTHK
 iF+WGotJz4HI/DH7qTID4IlLIth9sTCCV5SXaUURp9/OCwbMYVZCLPM2vB/aN4ir
 dt6ojtrhz6Rvn4KohbSUmwkKetW8789qLblOsv+5+mYO56eBeH7RP0qd/V1OyZw3
 UNkOrQz0Nm6F
 =lQtW
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix some issues in the ARM cpufreq drivers and in the operating
  performance points (OPP) framework.

  Specifics:

   - Fix useless WARN() in the OPP core and prevent a noisy warning
     from being printed by OPP _put functions (Dmitry Osipenko).

   - Fix error path when allocation failed in the arm_scmi cpufreq
     driver (Lukasz Luba).

   - Blacklist Qualcomm sc8180x and Qualcomm sm8150 in
     cpufreq-dt-platdev (Bjorn Andersson, Thara Gopinath).

   - Forbid cpufreq for 1.2 GHz variant in the armada-37xx cpufreq
     driver (Marek Behún)"

* tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  opp: Drop empty-table checks from _put functions
  cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
  cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
  cpufreq: arm_scmi: Fix error path when allocation failed
  opp: remove WARN when no valid OPPs remain
  cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
2021-08-20 13:38:42 -07:00
Linus Torvalds
ed3bad2e4f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "10 patches.

  Subsystems affected by this patch series: MAINTAINERS and mm (shmem,
  pagealloc, tracing, memcg, memory-failure, vmscan, kfence, and
  hugetlb)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hugetlb: don't pass page cache pages to restore_reserve_on_error
  kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE
  mm: vmscan: fix missing psi annotation for node_reclaim()
  mm/hwpoison: retry with shake_page() for unhandlable pages
  mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  MAINTAINERS: update ClangBuiltLinux IRC chat
  mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names
  mm/page_alloc: don't corrupt pcppage_migratetype
  Revert "mm: swap: check if swap backing device is congested or not"
  Revert "mm/shmem: fix shmem_swapin() race with swapoff"
2021-08-20 13:08:56 -07:00
Linus Torvalds
8ba9fbe1e4 drm fixes for 5.14-rc7
core:
 - fix drm_wait_vblank uapi copying bug
 
 ttm:
 - fix debugfs init when debugfs is off
 
 amdgpu:
 - vega10 SMU workload fix
 - DCN VM fix
 - DCN 3.01 watermark fix
 
 amdkfd:
 - SVM fix
 
 nouveau:
 - ampere display fixes
 - remove MM misfeature to fix a longstanding race condition
 
 i915:
 - tweaked display workaround for all PCHs
 - eDP MSO pipe sanity for ADL-P fix
 - remove unused symbol export
 
 mediatek:
 - AAL output size setting
 - Delete component in remove function
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmEfPsgACgkQDHTzWXnE
 hr5BEg//QPG86+IybghlzsLRtzyC/YnE12gkkREmq67gNqY8rVUdC8jR7nq2OJWa
 S1M6WR0wLR9mfw+zLr6HXnaP5+KqGe0JqnJVIyMx50LSL8qJzttO5wHzp+G2tI7C
 +lPozccei+VirWj351Qall+TmV1s2I1h/QnW/CyGEa/uQ8iZXr7+1fad3st8194c
 Lzb+qFvwykEa4jeo8oD2OHiCHCGkTkI1cnSt9g7/BMCzRr5KuNaFAd5hvSUaQyfD
 5FX+0/sPiGRzIbLDHU/xTVvdvDcSW0w8+x93EcPkaGLfQd7FNzOCoEOwdYvBqcSL
 p49rIEwfAltyHKKvqajD05DUAe+XS+LGhN99ykW1zoO+s9qPchzNrAaGImrZlbw+
 JAruyxW2NhxBBXGlx0AgyCBdbWjydosfIToj/GXf7uXZRkxxtNQlFj56i+PT0hx3
 Xadc7BBNZvHYY5JCi9oEDttrsQUexRy9XWTfN9o1I7oN4Uuksn7CtRXSHG/UfYa2
 90z6rVTmAaS8h/3Ir5FMBY/O8aQbKHK27Rui9vgSmRNMwxKUun2+UjKJLC8DFZFu
 zx4mc5p+ByC3GmLb9G4FOj1DI5ZeNIHrX/JD6Lf78ooje+vW5mme1iMGQp+jlb6C
 FAtpNteDMeqY/b5piJDoJ0qWxqm0YqL2K5S1mDauPPthAu73uS8=
 =7B9v
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regularly scheduled fixes. The ttm one solves a problem of GPU drivers
  failing to load if debugfs is off in Kconfig, otherwise the i915 and
  mediatek, and amdgpu fixes all fairly normal.

  Nouveau has a couple of display fixes, but it has a fix for a
  longstanding race condition in it's memory manager code, and the fix
  mostly removes some code that wasn't working properly and has no
  userspace users. This fix makes the diffstat kinda larger but in a
  good (negative line-count) way.

  core:
   - fix drm_wait_vblank uapi copying bug

  ttm:
   - fix debugfs init when debugfs is off

  amdgpu:
   - vega10 SMU workload fix
   - DCN VM fix
   - DCN 3.01 watermark fix

  amdkfd:
   - SVM fix

  nouveau:
   - ampere display fixes
   - remove MM misfeature to fix a longstanding race condition

  i915:
   - tweaked display workaround for all PCHs
   - eDP MSO pipe sanity for ADL-P fix
   - remove unused symbol export

  mediatek:
   - AAL output size setting
   - Delete component in remove function"

* tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Use DCN30 watermark calc for DCN301
  drm/i915/dp: remove superfluous EXPORT_SYMBOL()
  drm/i915/edp: fix eDP MSO pipe sanity checks for ADL-P
  drm/i915: Tweaked Wa_14010685332 for all PCHs
  drm/nouveau: rip out nvkm_client.super
  drm/nouveau: block a bunch of classes from userspace
  drm/nouveau/fifo/nv50-: rip out dma channels
  drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences
  drm/nouveau/disp: power down unused DP links during init
  drm/nouveau: recognise GA107
  drm: Copy drm_wait_vblank to user before returning
  drm/amd/display: Ensure DCN save after VM setup
  drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure
  drm/amd/pm: change the workload type for some cards
  Revert "drm/amd/pm: fix workload mismatch on vega10"
  drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails
  drm/mediatek: Add component_del in OVL and COLOR remove function
  drm/mediatek: Add AAL output size configuration
2021-08-20 12:59:54 -07:00