Commit graph

2023 commits

Author SHA1 Message Date
Bob Peterson
09f8ac747f gfs2: Don't demote a glock until its revokes are written
[ Upstream commit df5db5f9ee ]

Before this patch, run_queue would demote glocks based on whether
there are any more holders. But if the glock has pending revokes that
haven't been written to the media, giving up the glock might end in
file system corruption if the revokes never get written due to
io errors, node crashes and fences, etc. In that case, another node
will replay the metadata blocks associated with the glock, but
because the revoke was never written, it could replay that block
even though the glock had since been granted to another node who
might have made changes.

This patch changes the logic in run_queue so that it never demotes
a glock until its count of pending revokes reaches zero.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:48:41 +02:00
Al Viro
777179200c gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
commit 2103913265 upstream.

with the way fs/namei.c:do_last() had been done, ->atomic_open()
instances needed to recognize the case when existing file got
found with O_EXCL|O_CREAT, either by falling back to finish_no_open()
or failing themselves.  gfs2 one didn't.

Fixes: 6d4ade986f (GFS2: Add atomic_open support)
Cc: stable@kernel.org # v3.11
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:14:21 +01:00
Andreas Gruenbacher
4b67a516c6 gfs2: fix O_SYNC write handling
commit 6e5e41e2dc upstream.

In gfs2_file_write_iter, for direct writes, the error checking in the buffered
write fallback case is incomplete.  This can cause inode write errors to go
undetected.  Fix and clean up gfs2_file_write_iter along the way.

Based on a proposed fix by Christoph Hellwig <hch@lst.de>.

Fixes: 967bcc91b0 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:07 -08:00
Christoph Hellwig
e57e77e932 gfs2: move setting current->backing_dev_info
commit 4c0e8dda60 upstream.

Set current->backing_dev_info just around the buffered write calls to
prepare for the next fix.

Fixes: 967bcc91b0 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:07 -08:00
Bob Peterson
0809e1087c gfs2: fix glock reference problem in gfs2_trans_remove_revoke
[ Upstream commit fe5e7ba11f ]

Commit 9287c6452d fixed a situation in which gfs2 could use a glock
after it had been freed. To do that, it temporarily added a new glock
reference by calling gfs2_glock_hold in function gfs2_add_revoke.
However, if the bd element was removed by gfs2_trans_remove_revoke, it
failed to drop the additional reference.

This patch adds logic to gfs2_trans_remove_revoke to properly drop the
additional glock reference.

Fixes: 9287c6452d ("gfs2: Fix occasional glock use-after-free")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17 20:35:55 +01:00
Bob Peterson
7baf8fd1ff gfs2: take jdata unstuff into account in do_grow
[ Upstream commit bc0205612b ]

Before this patch, function do_grow would not reserve enough journal
blocks in the transaction to unstuff jdata files while growing them.
This patch adds the logic to add one more block if the file to grow
is jdata.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05 09:20:36 +01:00
Andreas Gruenbacher
fa3fe5f442 gfs2: Fix marking bitmaps non-full
[ Upstream commit ec23df2b0c ]

Reservations in gfs can span multiple gfs2_bitmaps (but they won't span
multiple resource groups).  When removing a reservation, we want to
clear the GBF_FULL flags of all involved gfs2_bitmaps, not just that of
the first bitmap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01 09:16:15 +01:00
Tim Smith
4d7cf69b77 GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
[ Upstream commit 1eb8d73879 ]

Flushing the workqueue can cause operations to happen which might
call gfs2_log_reserve(), or get stuck waiting for locks taken by such
operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
will never wake because the only thing which can wake it is gfs2_logd()
which was already stopped.

This causes umount of a gfs2 filesystem to wedge permanently if, for
example, the umount immediately follows a large delete operation.

When this occured, the following stack trace was obtained from the
umount command

[<ffffffff81087968>] flush_workqueue+0x1c8/0x520
[<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
[<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
[<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
[<ffffffff811b7ff7>] kill_block_super+0x27/0x70
[<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
[<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
[<ffffffff811b79b9>] deactivate_super+0x59/0x60
[<ffffffff811d2998>] cleanup_mnt+0x58/0x80
[<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
[<ffffffff8108c87d>] task_work_run+0x7d/0xa0
[<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
[<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
[<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Tim Smith <tim.smith@citrix.com>
Signed-off-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-24 08:20:48 +01:00
Bob Peterson
f3afad5d1e gfs2: slow the deluge of io error messages
[ Upstream commit b524abcc01 ]

When an io error is hit, it calls gfs2_io_error_bh_i for every
journal buffer it can't write. Since we changed gfs2_io_error_bh_i
recently to withdraw later in the cycle, it sends a flood of
errors to the console. This patch checks for the file system already
being withdrawn, and if so, doesn't send more messages. It doesn't
stop the flood of messages, but it slows it down and keeps it more
reasonable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-24 08:20:26 +01:00
Bob Peterson
48b128cddb gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated
[ Upstream commit 4f36cb36c9 ]

The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when
a rgrp buffer is valid. It's cleared when the glock is invalidated,
signifying that the buffer data is now invalid. But before this
patch, function update_rgrp_lvb was setting the flag when it
determined it had a valid lvb. But that's an invalid assumption:
just because you have a valid lvb doesn't mean you have valid
buffers. After all, another node may have made the lvb valid,
and this node just fetched it from the glock via dlm.

Consider this scenario:
1. The file system is mounted with RGRPLVB option.
2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks
   to GL_SKIP, it skips the gfs2_rgrp_bh_get.
3. Since loops == 0 and the allocation target (ap->target) is
   bigger than the largest known chunk of blocks in the rgrp
   (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses
   the call to gfs2_rgrp_bh_get there as well.
4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses
   gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due
   to this invalid assumption.
5. The next time update_rgrp_lvb is called, it sees the bit is set
   and just returns 0, assuming both the lvb and rgrp are both
   uptodate. But since this is a smaller allocation, or space has
   been freed by another node, thus adjusting the lvb values,
   it decides to use the rgrp for allocations, with invalid rd_free
   due to the fact it was never updated.

This patch changes update_rgrp_lvb so it doesn't set the UPTODATE
flag anymore. That way, it has no choice but to fetch the latest
values.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20 18:45:31 +01:00
Bob Peterson
e0c1e6e55b gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
commit f0b444b349 upstream.

In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.

This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.

Fixes: d552a2b9b3 ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 13:10:07 +02:00
Andreas Gruenbacher
21344f0575 gfs2: gfs2_walk_metadata fix
commit a27a0c9b6a upstream.

It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size.  This will confuse fiemap as well as lseek with the
SEEK_DATA flag.

Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.

Fixes xfstest generic/490.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-16 10:12:41 +02:00
Andreas Gruenbacher
c4b51dbccc gfs2: Fix occasional glock use-after-free
[ Upstream commit 9287c6452d ]

This patch has to do with the life cycle of glocks and buffers.  When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list.  Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.

So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke.  At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node.  This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.

Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list.  The
evict decremented the glock reference count to zero, which allowed the
glock to be freed.  After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.

This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31 06:46:07 -07:00
Ross Lagerwall
bac8520892 gfs2: Fix lru_count going negative
[ Upstream commit 7881ef3f33 ]

Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:

vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
    negative objects to delete nr=-1

This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
   decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
   lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
   incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
   to the lru list but the lru_count has still decreased by one.

Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31 06:46:06 -07:00
Andreas Gruenbacher
fdc78eedc5 gfs2: Fix sign extension bug in gfs2_update_stats
commit 5a5ec83d6a upstream.

Commit 4d207133e9 changed the types of the statistic values in struct
gfs2_lkstats from s64 to u64.  Because of that, what should be a signed
value in gfs2_update_stats turned into an unsigned value.  When shifted
right, we end up with a large positive value instead of a small negative
value, which results in an incorrect variance estimate.

Fixes: 4d207133e9 ("gfs2: Make statistics unsigned, suitable for use with do_div()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31 06:46:02 -07:00
Andreas Gruenbacher
4f5a4c8881 gfs2: Fix missed wakeups in find_insert_glock
commit 605b0487f0 upstream.

Mark Syms has reported seeing tasks that are stuck waiting in
find_insert_glock.  It turns out that struct lm_lockname contains four padding
bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
hashing the glock name.  As a result, we can end up waking up the wrong
waitqueue, and the waiting tasks may be stuck forever.

Fix that by using ht_parms.key_len instead of sizeof(struct lm_lockname) for
the key length.

Reported-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13 14:02:40 -07:00
Andreas Gruenbacher
8b9be9db8a gfs2: Revert "Fix loop in gfs2_rbm_find"
commit e74c98ca2d upstream.

This reverts commit 2d29f6b96d.

It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone.  Let's revert this
for now to have more time for a proper fix.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:13 +01:00
Andreas Gruenbacher
6ef56c9ad7 gfs2: Fix loop in gfs2_rbm_find
commit 2d29f6b96d upstream.

Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f44 broke.  The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.

Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:07 +01:00
Andreas Gruenbacher
310486107d gfs2: Get rid of potential double-freeing in gfs2_create_inode
commit 6ff9b09e00 upstream.

In gfs2_create_inode, after setting and releasing the acl / default_acl, the
acl / default_acl pointers are not set to NULL as they should be.  In that
state, when the function reaches label fail_free_acls, gfs2_create_inode will
try to release the same acls again.

Fix that by setting the pointers to NULL after releasing the acls.  Slightly
simplify the logic.  Also, posix_acl_release checks for NULL already, so
there is no need to duplicate those checks here.

Fixes: e01580bf9e ("gfs2: use generic posix ACL infrastructure")
Reported-by: Pan Bian <bianpan2016@163.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:07 +01:00
Andreas Gruenbacher
38084377af gfs2: Fix iomap buffer head reference counting bug
commit c26b5aa8ef upstream.

GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
gfs2_iomap_end in iomap->private.  It sets that private pointer in
gfs2_iomap_get.  Users of gfs2_iomap_get other than gfs2_iomap_begin
would have to release iomap->private, but this isn't done correctly,
leading to a leak of buffer head references.

To fix this, move the code for setting iomap->private from
gfs2_iomap_get to gfs2_iomap_begin.

Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01 09:37:34 +01:00
Andrew Price
331bd738eb gfs2: Don't leave s_fs_info pointing to freed memory in init_sbd
commit 4c62bd9cea upstream.

When alloc_percpu() fails, sdp gets freed but sb->s_fs_info still points
to the same address. Move the assignment after that error check so that
s_fs_info can only point to a valid sdp or NULL, which is checked for
later in the error path, in gfs2_kill_super().

Reported-by: syzbot+dcb8b3587445007f5808@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01 09:37:27 +01:00
Andreas Gruenbacher
55795dac8e gfs2: Fix metadata read-ahead during truncate (2)
commit e7445ceddf upstream.

The previous attempt to fix for metadata read-ahead during truncate was
incorrect: for files with a height > 2 (1006989312 bytes with a block
size of 4096 bytes), read-ahead requests were not being issued for some
of the indirect blocks discovered while walking the metadata tree,
leading to significant slow-downs when deleting large files.  Fix that.

In addition, only issue read-ahead requests in the first pass through
the meta-data tree, while deallocating data blocks.

Fixes: c3ce5aa9b0 ("gfs2: Fix metadata read-ahead during truncate")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21 09:19:24 +01:00
Andreas Gruenbacher
8793f67af1 gfs2: Put bitmap buffers in put_super
commit 10283ea525 upstream.

gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects
attached to the resource group glocks.  That function should release the
buffers attached to the gfs2_bitmap objects (bi_bh), but the call to
gfs2_rgrp_brelse for doing that is missing.

When gfs2_releasepage later runs across these buffers which are still
referenced, it refuses to free them.  This causes the pages the buffers
are attached to to remain referenced as well.  With enough mount/unmount
cycles, the system will eventually run out of memory.

Fix this by adding the missing call to gfs2_rgrp_brelse in
gfs2_clear_rgrpd.

(Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.)

Fixes: 39b0f1e929 ("GFS2: Don't brelse rgrp buffer_heads every allocation")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21 09:19:24 +01:00
Al Viro
8c448126af gfs2_meta: ->mount() can get NULL dev_name
commit 3df629d873 upstream.

get in sync with mount_bdev() handling of the same

Reported-by: syzbot+c54f8e94e6bba03b04e9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13 11:08:43 -08:00
Andreas Gruenbacher
fee5150c48 gfs2: Fix iomap buffered write support for journaled files (2)
It turns out that the fix in commit 6636c3cc56 is bad; the assertion
that the iomap code no longer creates buffer heads is incorrect for
filesystems that set the IOMAP_F_BUFFER_HEAD flag.

Instead, what's happening is that gfs2_iomap_begin_write treats all
files that have the jdata flag set as journaled files, which is
incorrect as long as those files are inline ("stuffed").  We're handling
stuffed files directly via the page cache, which is why we ended up with
pages without buffer heads in gfs2_page_add_databufs.

Fix this by handling stuffed journaled files correctly in
gfs2_iomap_begin_write.

This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-12 17:14:42 +02:00
Andreas Gruenbacher
dc480feb45 gfs2: Fix iomap buffered write support for journaled files
Commit 64bc06bb32 broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done.  However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs.  Fix
that by creating buffer heads ourself when needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-09 18:20:13 +02:00
Linus Torvalds
2b2f2aedba gfs2 4.19 merge
Changes on top of v4.18-rc1 / iomap-4.19-merge-1:
 
 1. Iomap support for buffered writes and for direct I/O.
 2. Two patches that reduce the size of struct gfs2_inode.
 3. Lots of fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbcakSAAoJENW/n+sDE2U6AR0QAJlai+92ERML2pM+1hiuEHWP
 KizBoV/53pc+dll1dlcEOHQFys2vbcFavcCtcsTXNhLSp1wOqxyzcFQTX6ekWfSZ
 hTvJvAKTbeXu0zOWSV2DcX40JWb7SKDAxjxNb8XhL0COilgM9r+mdqoY/UNyVSel
 SVmWvs8UYt6UBnw4G8h5UlzSYxl/M64udU1pVO5D8JMQ5cxDKj3kfFoJLLKBDwLF
 vaNFxiihdTzmMwMNo3Px7GFSsb5Jnyo9LgAoDKsYd9YlzqGpAvvoYXH8itj4TuSb
 sM1KTUZK+97XvquZfuv5BniEifP7XZSq4xYIxyr9HMaOefeys0GdzaCSCb3ifFte
 7bqjowlAbHWwBNa9ofuJ1NShsAiOv0GUGDzlY+T/0IgSlqRr0JxAikJ3jLIZQ1Hf
 CwWY66XakeSi5euDTi41SuGZMcxTXaX15VbXl6/SGsv4X0dyVXleBz6RuC9Q+n2H
 7nqlGppRW2NB1WUqkJ15n9JaNLAF5I6umERTBXKGODM56p/GmZYoScCEGrqj9obN
 CntPtL6yrazASjV3+zqXA//OvTb3xfykYu17wVLKhXWD0YWQiuDfA481BLvEut2G
 aTtNU3b4VDwv5NuBf7G3wvN0+v3WyJ3gRfhTaEdFnX5PpcH3eHz5/fzU7zCJJwDu
 g53icn3efqNu7WvpAkB+
 =oGs4
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.19.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - iomap support for buffered writes and for direct I/O

 - two patches that reduce the size of struct gfs2_inode

 - lots of fixes and cleanups

* tag 'gfs2-4.19.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (25 commits)
  gfs2: eliminate update_rgrp_lvb_unlinked
  gfs2: Fix gfs2_testbit to use clone bitmaps
  gfs2: Get rid of gfs2_ea_strlen
  gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_out
  gfs2: Special-case rindex for gfs2_grow
  GFS2: rgrp free blocks used incorrectly
  gfs2: remove redundant variable 'moved'
  gfs2: use iomap_readpage for blocksize == PAGE_SIZE
  gfs2: Use iomap for stuffed direct I/O reads
  gfs2: fallocate_chunk: Always initialize struct iomap
  GFS2: Fix recovery issues for spectators
  fs: gfs2: Adding new return type vm_fault_t
  gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr
  gfs2: Don't reject a supposedly full bitmap if we have blocks reserved
  gfs2: Eliminate redundant ip->i_rgd
  gfs2: Stop messing with ip->i_rgd in the rlist code
  gfs2: Remove gfs2_write_{begin,end}
  gfs2: iomap direct I/O support
  gfs2: gfs2_extent_length cleanup
  gfs2: iomap buffered write support
  ...
2018-08-15 22:40:03 -07:00
Bob Peterson
f5580d0f8b gfs2: eliminate update_rgrp_lvb_unlinked
Function update_rgrp_lvb_unlinked used to do the same thing as
be32_add_cpu. This patch removes it in favor of using be32_add_cpu
directly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2018-08-08 10:34:39 -05:00
Bob Peterson
dffe12a828 gfs2: Fix gfs2_testbit to use clone bitmaps
Function gfs2_testbit is called in three places. Two of those places,
gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
by the block reservations scheme to determine the length of an extent of
free blocks. Before this patch, it wasn't using the clone bitmap, which
means recently-freed blocks were treated as free blocks for the purposes
of an allocation.

This patch adds a new parameter to gfs2_testbit to indicate whether or
not the clone bitmaps should be used (if available).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-08-07 10:07:00 -05:00
Andreas Gruenbacher
21e2156f3c gfs2: Get rid of gfs2_ea_strlen
Function gfs2_ea_strlen is only called from ea_list_i, so inline it
there.  Remove the duplicate switch statement and the creative use of
memcpy to set a null byte.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-08-03 13:20:02 +01:00
Bob Peterson
3f30f929bb gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_out
Before this patch gfs2_rgrp_ondisk2lvb was called after every call
to gfs2_rgrp_out. This patch just calls it directly from within
gfs2_rgrp_out, and moves the function to be before it so we don't
need a function prototype.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-26 14:49:43 -05:00
Andreas Gruenbacher
776125785a gfs2: Special-case rindex for gfs2_grow
To speed up the common case of appending to a file,
gfs2_write_alloc_required presumes that writing beyond the end of a file
will always require additional blocks to be allocated.  This assumption
is incorrect for preallocates files, but there are no negative
consequences as long as *some* space is still left on the filesystem.

One special file that always has some space preallocated beyond the end
of the file is the rindex: when growing a filesystem, gfs2_grow adds one
or more new resource groups and appends records describing those
resource groups to the rindex; the preallocated space ensures that this
is always possible.

However, when a filesystem is completely full, gfs2_write_alloc_required
will indicate that an additional allocation is required, and appending
the next record to the rindex will fail even though space for that
record has already been preallocated.  To fix that, skip the incorrect
optimization in gfs2_write_alloc_required, but for the rindex only.
Other writes to preallocated space beyond the end of the file are still
allowed to fail on completely full filesystems.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-25 22:56:14 +02:00
Bob Peterson
f6753df35c GFS2: rgrp free blocks used incorrectly
Before this patch, several functions in rgrp.c checked the value of
rgd->rd_free_clone. That does not take into account blocks that were
reserved by a multi-block reservation. This causes a problem when
space gets tight in the file system. For example, when function
gfs2_inplace_reserve checks to see if a rgrp has enough blocks to
satisfy the request, it can accept a rgrp that it should reject
because, although there are enough blocks to satisfy the request
_now_, those blocks may be reserved for another running process.

A second problem with this occurs when we've reserved the remaining
blocks in an rgrp: function rg_mblk_search() can reject an rgrp
improperly because it calculates:

   u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved;

But rd_reserved includes blocks that the current process just
reserved in its own call to inplace_reserve. For example, it can
reserve the last 128 blocks of an rgrp, then reject that same rgrp
because the above calculates out to free_blocks = 0;

Consequences include, but are not limited to, (1) leaving holes,
and thus increasing file system fragmentation, and (2) reporting
file system is full long before it actually is.

This patch introduces a new function, rgd_free, which returns the
number of clone-free blocks (blocks that are truly free as opposed
to blocks that are still being used because an unlinked file is
still open) minus the number of blocks reserved by processes, but
not counting the blocks we ourselves reserved (because obviously
we need to allocate them).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25 00:09:09 +02:00
Colin Ian King
d1b0cb933c gfs2: remove redundant variable 'moved'
Variable 'moved' s being assigned but is never used hence it is
redundant and can be removed.  This has been the case ever since commit
c752666c.

Cleans up clang warning:
warning: variable 'moved' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25 00:08:59 +02:00
Andreas Gruenbacher
f95cbb44ab gfs2: use iomap_readpage for blocksize == PAGE_SIZE
We only use iomap_readpage for pages that don't have buffer heads
attached yet: iomap_readpage would otherwise read pages from disk that
are marked buffer_uptodate() but not PageUptodate().  Those pages may
actually contain data more recent than what's on disk.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-25 00:08:49 +02:00
Andreas Gruenbacher
1d45bb7f9d gfs2: Use iomap for stuffed direct I/O reads
Remove the fallback code from direct to buffered I/O for stuffed reads.

For stuffed writes, we must keep the fallback code: the deferred glock
we are holding under direct I/O doesn't allow to write to the inode or
change the file size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-25 00:08:40 +02:00
Andreas Gruenbacher
0ed91eca11 Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next
Merge xfs branch 'iomap-4.19-merge' into linux-gfs2/for-next.  This
brings in readpage and direct I/O support for inline data.

The IOMAP_F_BUFFER_HEAD flag introduced in commit "iomap: add initial
support for writes without buffer heads" needs to be set for gfs2 as
well, so do that in the merge.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25 00:08:20 +02:00
Andreas Gruenbacher
c25892827c gfs2: fallocate_chunk: Always initialize struct iomap
In fallocate_chunk, always initialize the iomap before calling
gfs2_iomap_get_alloc: future changes could otherwise cause things like
iomap.flags to leak across calls.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-25 00:06:48 +02:00
Bob Peterson
4a7727725d GFS2: Fix recovery issues for spectators
This patch fixes a couple problems dealing with spectators who
remain with gfs2 mounts after the last non-spectator node fails.

Before this patch, spectator mounts would try to acquire the dlm's
mounted lock EX as part of its normal recovery sequence.
The mounted lock is only used to determine whether the node is
the first mounter, the first node to mount the file system, for
the purposes of file system recovery and journal replay.

It's not necessary for spectators: they should never do journal
recovery. If they acquire the lock it will prevent another "real"
first-mounter from acquiring the lock in EX mode, which means it
also cannot do journal recovery because it doesn't think it's the
first node to mount the file system.

This patch checks if the mounter is a spectator, and if so, avoids
grabbing the mounted lock. This allows a secondary mounter who is
really the first non-spectator mounter, to do journal recovery:
since the spectator doesn't acquire the lock, it can grab it in
EX mode, and therefore consider itself to be the first mounter
both as a "real" first mount, and as a first-real-after-spectator.

Note that the control lock still needs to be taken in PR mode
in order to fetch the lvb value so it has the current status of
all journal's recovery. This is used as it is today by a first
mounter to replay the journals. For spectators, it's merely
used to fetch the status bits. All recovery is bypassed and the
node waits until recovery is completed by a non-spectator node.

I also improved the cryptic message given by control_mount when
a spectator is waiting for a non-spectator to perform recovery.

It also fixes a problem in gfs2_recover_set whereby spectators
were never queueing recovery work for their own journal.
They cannot do recovery themselves, but they still need to queue
the work so they can check the recovery bits and clear the
DFL_BLOCK_LOCKS bit once the recovery happens on another node.

When the work queue runs on a spectator, it bypasses most of the
work so it won't print a bunch of annoying messages. All it will
print is a bunch of messages that look like this until recovery
completes on the non-spectator node:

GFS2: fsid=mycluster:scratch.s: recover generation 3 jid 0
GFS2: fsid=mycluster:scratch.s: recover jid 0 result busy

These continue every 1.5 seconds until the recovery is done by
the non-spectator, at which time it says:

GFS2: fsid=mycluster:scratch.s: recover generation 4 done

Then it proceeds with its mount.

If the file system is mounted in spectator node and the last
remaining non-spectator is fenced, any IO to the file system is
blocked by dlm and the spectator waits until recovery is
performed by a non-spectator.

If a spectator tries to mount the file system before any
non-spectators, it blocks and repeatedly gives this kernel
message:

GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25 00:06:24 +02:00
Andreas Gruenbacher
a3479c7fc0 Merge branch 'iomap-write' into linux-gfs2/for-next
Pull in the gfs2 iomap-write changes: Tweak the existing code to
properly support iomap write and eliminate an unnecessary special case
in gfs2_block_map.  Implement iomap write support for buffered and
direct I/O.  Simplify some of the existing code and eliminate code that
is no longer used:

  gfs2: Remove gfs2_write_{begin,end}
  gfs2: iomap direct I/O support
  gfs2: gfs2_extent_length cleanup
  gfs2: iomap buffered write support
  gfs2: Further iomap cleanups

This is based on the following changes on the xfs 'iomap-4.19-merge'
branch:

  iomap: add private pointer to struct iomap
  iomap: add a page_done callback
  iomap: generic inline data handling
  iomap: complete partial direct I/O writes synchronously
  iomap: mark newly allocated buffer heads as new
  fs: factor out a __generic_write_end helper

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24 20:02:40 +02:00
Souptick Joarder
109dbb1e6f fs: gfs2: Adding new return type vm_fault_t
Use new return type vm_fault_t for gfs2_page_mkwrite
handler.

see commit 1c8f422059 ("mm: change return type to
vm_fault_t") for reference.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24 20:02:11 +02:00
Chengguang Xu
910f3d58d0 gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr
It seems better to get size by calling posix_acl_xattr_size() instead of
calling posix_acl_to_xattr() with NULL buffer argument.

posix_acl_xattr_size() never returns 0, so remove the unnecessary check.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24 20:02:11 +02:00
Bob Peterson
e79e0e1428 gfs2: Don't reject a supposedly full bitmap if we have blocks reserved
Before this patch, you could get into situations like this:

1. Process 1 searches for X free blocks, finds them, makes a reservation
2. Process 2 searches for free blocks in the same rgrp, but now the
   bitmap is full because process 1's reservation is skipped over.
   So it marks the bitmap as GBF_FULL.
3. Process 1 tries to allocate blocks from its own reservation, but
   since the GBF_FULL bit is set, it skips over the rgrp and searches
   elsewhere, thus not using its own reservation.

This patch adds an additional check to allow processes to use their
own reservations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24 20:02:11 +02:00
Al Viro
44907d7900 get rid of 'opened' argument of ->atomic_open() - part 3
now it can be done...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:20 -04:00
Al Viro
b452a458ca getting rid of 'opened' argument of ->atomic_open() - part 2
__gfs2_lookup(), gfs2_create_inode(), nfs_finish_open() and fuse_create_open()
don't need 'opened' anymore.  Get rid of that argument in those.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:20 -04:00
Al Viro
be12af3ef5 getting rid of 'opened' argument of ->atomic_open() - part 1
'opened' argument of finish_open() is unused.  Kill it.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:19 -04:00
Al Viro
73a09dd943 introduce FMODE_CREATED and switch to it
Parallel to FILE_CREATED, goes into ->f_mode instead of *opened.
NFS is a bit of a wart here - it doesn't have file at the point
where FILE_CREATED used to be set, so we need to propagate it
there (for now).  IMA is another one (here and everywhere)...

Note that this needs do_dentry_open() to leave old bits in ->f_mode
alone - we want it to preserve FMODE_CREATED if it had been already
set (no other bit can be there).

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:18 -04:00
Al Viro
aad888f828 switch all remaining checks for FILE_OPENED to FMODE_OPENED
... and don't bother with setting FILE_OPENED at all.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12 10:04:18 -04:00
Andreas Gruenbacher
b7eba890a2 gfs2: Eliminate redundant ip->i_rgd
GFS2 remembers the last rgrp used for allocations in ip->i_rgd.
However, block allocations are made by way of a reservations structure,
ip->i_res, which keeps the last rgrp in ip->i_res.rs_rgd, and ip->i_res
is kept in sync with ip->i_res.rs_rgd, so it's redundant.  Get rid of
ip->i_rgd and just use ip->i_res.rs_rgd in its place.

Based on patches by Robert Peterson.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-07-05 17:47:16 +02:00
Andreas Gruenbacher
03f8c41c73 gfs2: Stop messing with ip->i_rgd in the rlist code
In the resource group list code, keep the last resource group added in
the last position in the array.  Check against that instead of messing
with ip->i_rgd.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-07-04 21:38:42 +01:00