Commit graph

59855 commits

Author SHA1 Message Date
Gao Xiang
618f40ea02 erofs: use read_cache_page_gfp for erofs_get_meta_page
As Christoph said [1], "I'd much prefer to just use
read_cache_page_gfp, and live with the fact that this
allocates bufferheads behind you for now.  I'll try to
speed up my attempts to get rid of the buffer heads on
the block device mapping instead. "

This simplifies the code a lot and a minor thing is
"no REQ_META (e.g. for blktrace) on metadata at all..."

[1] https://lore.kernel.org/r/20190903153704.GA2201@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-26-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:09 +02:00
Gao Xiang
4231138fe0 erofs: always use iget5_locked
As Christoph said [1] [2], "Just use the slightly
more complicated 32-bit version everywhere so that
you have a single actually tested code path.
And then remove this helper. "

[1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
[2] https://lore.kernel.org/r/20190902125320.GA16726@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-25-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:09 +02:00
Gao Xiang
fe7c242357 erofs: use read_mapping_page instead of sb_bread
As Christoph said [1], "This seems to be your only direct
use of buffer heads, which while not deprecated are a bit
of an ugly step child.  So if you can easily avoid creating
a buffer_head dependency in a new filesystem I think you
should avoid it. "

[1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-24-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:09 +02:00
Gao Xiang
4f761fa253 erofs: rename errln/infoln/debugln to erofs_{err, info, dbg}
Add prefix "erofs_" to these functions and print
sb->s_id as a prefix to erofs_{err, info} so that
the user knows which file system is affected.

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-23-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:09 +02:00
Gao Xiang
84947eb603 erofs: save one level of indentation
As Christoph said [1], ".. and save one
level of indentation."

[1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-22-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:09 +02:00
Gao Xiang
73d03931be erofs: kill use_vmap module parameter
As Christoph said [1],
"vm_map_ram is supposed to generally behave better.  So if
it doesn't please report that that to the arch maintainer
and linux-mm so that they can look into the issue.  Having
user make choices of deep down kernel internals is just
a horrible interface.

Please talk to maintainers of other bits of the kernel
if you see issues and / or need enhancements. "

Let's redo the previous conclusion and kill the vmap
approach.

[1] https://lore.kernel.org/r/20190830165533.GA10909@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-21-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:09 +02:00
Gao Xiang
e2c71e74b2 erofs: kill all erofs specific fault injection
As Christoph suggested [1], "Please just use plain kmalloc
everywhere and let the normal kernel error injection code
take care of injeting any errors."

[1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-20-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
99634bf388 erofs: add "erofs_" prefix for common and short functions
Add erofs_ prefix to free_inode, alloc_inode, ...

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-19-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
94e4e153b1 erofs: kill __submit_bio()
As Christoph pointed out [1], "
Why is there __submit_bio which really just obsfucates
what is going on?  Also why is __submit_bio using
bio_set_op_attrs instead of opencode it as the comment
right next to it asks you to? "

Let's use submit_bio directly instead.

[1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-18-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
e655b5b3a2 erofs: kill prio and nofail of erofs_get_meta_page()
As Christoph pointed out [1],
"Why is there __erofs_get_meta_page with the two weird
booleans instead of a single erofs_get_meta_page that
gets and gfp_t for additional flags and an unsigned int
for additional bio op flags."

And since all callers can handle errors, let's kill
prio and nofail and erofs_get_inline_page() now.

[1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-17-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
a5c0b7802c erofs: localize erofs_grab_bio()
As Christoph pointed out [1], "erofs_grab_bio tries to
handle a bio_alloc failure, except that the function will
not actually fail due the mempool backing it."

Sorry about useless code, fix it now and
localize erofs_grab_bio [2].

[1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/
[2] https://lore.kernel.org/r/20190902122016.GL15931@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-16-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
688a5f2ed4 erofs: kill verbose debug info in erofs_fill_super
As Christoph said [1], "That is some very verbose
debug info.  We usually don't add that and let
people trace the function instead. "

[1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-15-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
0259f20948 erofs: use dsb instead of layout for ondisk super_block
As Christoph pointed out [1], "Why is the variable name
for the on-disk subperblock layout? We usually still
calls this something with sb in the name, e.g. dsb.
for disksuper block. " Let's fix it.

[1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-14-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:08 +02:00
Gao Xiang
a2c75c8143 erofs: better erofs symlink stuffs
Fix as Christoph suggested [1] [2], "remove is_inode_fast_symlink
and just opencode it in the few places using it"

and
"Please just set the ops directly instead of obsfucating that in
a single caller, single line inline function.  And please set it
instead of the normal symlink iops in the same place where you
also set those."

[1] https://lore.kernel.org/r/20190830163910.GB29603@infradead.org/
[2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-13-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
2d78c209b9 erofs: update comments in inode.c
As Christoph suggested [1], update them all.

[1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-12-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
ea559e7b84 erofs: update erofs_fs.h comments
As Christoph said [1] [2], update it now.

[1] https://lore.kernel.org/r/20190902124521.GA22153@infradead.org/
[2] https://lore.kernel.org/r/20190902120548.GB15931@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-11-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
a5876e24f1 erofs: use erofs_inode naming
As Christoph suggested [1], "Why is this called vnode instead
of inode?  That seems like a rather odd naming for a Linux
file system."

[1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-10-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
1c2dfbf9c2 erofs: kill erofs_{init,exit}_inode_cache
As Christoph said [1] "having this function seems
entirely pointless", let's kill those.

filesystem                              function name
ext2,f2fs,ext4,isofs,squashfs,cifs,...  init_inodecache

In addition, add a necessary "rcu_barrier()" on exit_fs();

[1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-9-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
8a76568225 erofs: better naming for erofs inode related stuffs
updates inode naming
 - kill is_inode_layout_compression [1]
 - kill magic underscores [2] [3]
 - better naming for datamode & data_mapping_mode [3]
 - better naming erofs_inode_{compact, extended} [4]

[1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
[2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/
[3] https://lore.kernel.org/r/20190902122627.GN15931@infradead.org/
[4] https://lore.kernel.org/r/20190902125438.GA17750@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-8-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
426a930891 erofs: use feature_incompat rather than requirements
As Christoph said [1], "This is only cosmetic, why
not stick to feature_compat and feature_incompat?"

In my thought, requirements means "incompatible"
instead of "feature" though.

[1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-7-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
c39747f770 erofs: update erofs_inode_is_data_compressed helper
As Christoph said, "This looks like a really obsfucated
way to write:
	return datamode == EROFS_INODE_FLAT_COMPRESSION ||
		datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY; "

Although I had my own consideration, it's the right way for now.

[1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-6-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
ed34aa4a8a erofs: kill __packed for on-disk structures
As Christoph suggested "Please don't add __packed" [1],
remove all __packed except struct erofs_dirent here.

Note that all on-disk fields except struct erofs_dirent
(12 bytes with a 8-byte nid) in EROFS are naturally aligned.

[1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-5-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
b6796abd3c erofs: some macros are much more readable as a function
As Christoph suggested [1], these macros are much
more readable as a function.

[1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-4-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:07 +02:00
Gao Xiang
60a49ba8fe erofs: on-disk format should have explicitly assigned numbers
As Christoph suggested [1], on-disk format should have
explicitly assigned numbers.

[1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-3-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:06 +02:00
Gao Xiang
4b66eb51d2 erofs: remove all the byte offset comments
As Christoph suggested [1], "Please remove all the byte offset comments.
that is something that can easily be checked with gdb or pahole."

[1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190904020912.63925-2-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05 20:10:06 +02:00
Pratik Shinde
512f9922ee erofs: using switch-case while checking the inode type.
while filling the linux inode, using switch-case statement to check
the type of inode.
switch-case statement looks more clean here.

Signed-off-by: Pratik Shinde <pratikshinde320@gmail.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190830095615.10995-1-pratikshinde320@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04 08:31:54 +02:00
Gao Xiang
097a802ae1 erofs: reduntant assignment in __erofs_get_meta_page()
As Joe Perches suggested [1],
 		err = bio_add_page(bio, page, PAGE_SIZE, 0);
-		if (unlikely(err != PAGE_SIZE)) {
+		if (err != PAGE_SIZE) {
 			err = -EFAULT;
 			goto err_out;
 		}

The initial assignment to err is odd as it's not
actually an error value -E<FOO> but a int size
from a unsigned int len.

Here the return is either 0 or PAGE_SIZE.

This would be more legible to me as:

		if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) {
			err = -EFAULT;
			goto err_out;
		}

[1] https://lore.kernel.org/r/74c4784319b40deabfbaea92468f7e3ef44f1c96.camel@perches.com/
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190829171741.225219-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-30 09:02:10 +02:00
Gao Xiang
8d8a09b093 erofs: remove all likely/unlikely annotations
As Dan Carpenter suggested [1], I have to remove
all erofs likely/unlikely annotations.

[1] https://lore.kernel.org/linux-fsdevel/20190829154346.GK23584@kadam/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190829163827.203274-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-30 09:02:02 +02:00
Gao Xiang
47e4937a4a erofs: move erofs out of staging
EROFS filesystem has been merged into linux-staging for a year.

EROFS is designed to be a better solution of saving extra storage
space with guaranteed end-to-end performance for read-only files
with the help of reduced metadata, fixed-sized output compression
and decompression inplace technologies.

In the past year, EROFS was greatly improved by many people as
a staging driver, self-tested, betaed by a large number of our
internal users, successfully applied to almost all in-service
HUAWEI smartphones as the part of EMUI 9.1 and proven to be stable
enough to be moved out of staging.

EROFS is a self-contained filesystem driver. Although there are
still some TODOs to be more generic, we have a dedicated team
actively keeping on working on EROFS in order to make it better
with the evolution of Linux kernel as the other in-kernel filesystems.

As Pavel suggested, it's better to do as one commit since git
can do moves and all histories will be saved in this way.

Let's promote it from staging and enhance it more actively as
a "real" part of kernel for more wider scenarios!

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Pavel Machek <pavel@denx.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J . Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Fang Wei <fangwei1@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190822213659.5501-1-hsiangkao@aol.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-24 14:20:10 +02:00
Linus Torvalds
3039fadf2b for-5.3-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl1ZOygACgkQxWXV+ddt
 WDvokw/8CRjbN5Bjlk/JzmikH+mU28Cd7qQgahWw7Afkyh5Gzb4IJJbNXzapy988
 dMMKYF2H6lxY46EZG8cF4MFjCv8L4L1eZ9CqwfZyf7MfzPL2pnLSN77QJYYebWp3
 y9rc8xv+qUdIcumQP25yxXmtN0YxT5bIJiuCmpHJFNfyijHVRnXV2CxQ8nwe63/1
 G25a03x6BNKtSrU3ZP0fW2VjARKzIF7i+bEy4Ew3n3dNKqAiROYecswcBedhCBkF
 D9hL62uHcxQSeHi/6lAZYKpsp4g4pKEO4c92MxsI8PJtd4zgHQG7MttXsHC1F1FQ
 lHcP3vxKICOOMa13W6QdytSX6uSpjeLyMTDfmvaahQmwG6I5dBN56pkGlWdDMKNn
 NUCNBmV063D7Shed4W2uIafLfo3BLEwKr2pd6pOfOZHwOKPWblJ0v4KzVDLoKC7v
 CA5HB9BBz4KfSyp3fkm9B5VGJW938vRHVfx55IoakL+tfs67cKDYo8QEFHDuz0P5
 DrYNwKvp4a5llGQ6vdRKfeiN7vBea10795MI5vFJiGUHfY3pN99R4UUed7kaul58
 n6gqYukcPtIBqm8auxK037nngA+V0N2y0ceM1/aKUGaZVlCtSmEKXseYjbaiH6fP
 MEZSgZn4W5qnu2oQuKohfQsNHtY5WP9489GpByy+DS5QsbDaueg=
 =ovM7
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Two fixes that popped up during testing:

   - fix for sysfs-related code that adds/removes block groups, warnings
     appear during several fstests in connection with sysfs updates in
     5.3, the fix essentially replaces a workaround with scope NOFS and
     applies to 5.2-based branch too

   - add sanity check of trim range"

* tag 'for-5.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: trim: Check the range passed into to prevent overflow
  Btrfs: fix sysfs warning and missing raid sysfs directories
2019-08-18 09:51:48 -07:00
Linus Torvalds
8fde2832bd for-linus-2019-08-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1Ygq4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjUnD/47XM1xHO2L/66+XiRnXwNc0Zmc0cNrtIsp
 09EqZFeSwaqJKfoodcDphuHSYf6jkLk9xRKiFEr+KrUfzwvmgN3fFlbk6Azz0A4f
 ACv4DtTPML40QUGzo5c+UfnJTwK8z4An/lG4aNDB3UucFwoWhfvLkzQAN+FNilAQ
 sMTWPvhjiV/E7BMa7p108/NL9sv0UzUzgIOI7cZCq/Z5/jJBiohl4DzPPyfCKzOK
 shKP3Q7zId2q7d9zJZZuwy/iYLTCEHzy+37hxS7343vXeTlxKbrxY6xKZ8Rhsccx
 E3ep+SYuRxl7BHXPTwFh5HSMRXq5JEVBy3dnoMzFEu6pqzAPyLjQS0tG/lR9T1uS
 E8t0QnsdjVvedF/c3a3b2pVcYCbRjHsB/5DLBPxRfgAPkvVm180XKQM95HkLQYle
 A74mgU1WODnXsdyJeAacf1b+3t8p+5x5vpRq6Ls6dgmGn74qhfvs4tQnqypFFnyR
 Le0O7ICWKotQFo0CzbPB029xORepq9HLkFEeh73K6uG+Bn5iGMC1oUmxB0uSuHpz
 h9lPDjjj6RSLpzp1jCqswSHsQi00sLSKMtytjEt83/EWPcyN2oiPP1BTq+9jITYR
 SvoKF6R+CLwQscb/DEXb/vRSehg5aivXHg1AOo0l6/OiPQmNdmSZEU4/McPoB2xW
 1MC5Y4wV1g==
 =f6DS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A collection of fixes that should go into this series. This contains:

   - Revert of the REQ_NOWAIT_INLINE and associated dio changes. There
     were still corner cases there, and even though I had a solution for
     it, it's too involved for this stage. (me)

   - Set of NVMe fixes (via Sagi)

   - io_uring fix for fixed buffers (Anthony)

   - io_uring defer issue fix (Jackie)

   - Regression fix for queue sync at exit time (zhengbin)

   - xen blk-back memory leak fix (Wenwen)"

* tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block:
  io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list
  block: remove REQ_NOWAIT_INLINE
  io_uring: fix manual setup of iov_iter for fixed buffers
  xen/blkback: fix memory leaks
  blk-mq: move cancel of requeue_work to the front of blk_exit_queue
  nvme-pci: Fix async probe remove race
  nvme: fix controller removal race with scan work
  nvme-rdma: fix possible use-after-free in connect error flow
  nvme: fix a possible deadlock when passthru commands sent to a multipath device
  nvme-core: Fix extra device_put() call on error path
  nvmet-file: fix nvmet_file_flush() always returning an error
  nvmet-loop: Flush nvme_delete_wq when removing the port
  nvmet: Fix use-after-free bug when a port is removed
  nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
2019-08-17 19:39:54 -07:00
Linus Torvalds
a69e90512d Changes since last update:
- Fix crashes when the attr fork isn't present due to errors but inode
   inactivation tries to zap the attr data anyway.
 - Convert more directory corruption debugging asserts to actual
   EFSCORRUPTED returns instead of blowing up later on.
 - Don't fail writeback just because we ran out of memory allocating
   metadata log data.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1RlXoACgkQ+H93GTRK
 tOtc7A/+JIidhI/MHQLs7Ab9GW+PsHHBMSbVTV4Ge+SlfZtPNI38zrC1MC5LWvvV
 bndOpRjLm4nOJcB7fsoEWufTs1dKOIUjk2yQi8x47ZvE+B/RcA4b6IDhwpbAI8GW
 kt1RLNec9kpzhxCFFPzsXT9MwjEvvOvTeXfxaXTmiuB2kbJkR5dTlCUS2nUDnqsG
 FGdmOUDjy1uVfFcSrp75KT/iYaqW08cG+uY/eUHRm+YMUKI8hF1t+n8cDnSg96VX
 IN2DT1d3dTWiiF+JUZnMhVwJvPgV95DOf+yYy/F7qOcJUEmQ9tD6+0Ml/cI/AeLG
 zERxHXM9A9Jy8S+2xkvf0J/+HStwfviWNToK3pbMIM1ZsoMTi9q8VgbB3AaFiijf
 C4Q4T3W0jC44om8X/Ta/c+G/64Tj8yenzLDeTHvtQkoq77QPBam/aYjBc79oYvHi
 r+R61kHNto+YjJsRbkwgF/S+bzru1qY9Ccr0LJZrUkSzh4d6p94fbQc+NX4L2sv7
 WzAc+kOR/7qgVgy4gVr3ju0d89kP/Xn/0e0Ma0V8CSZlX5yg1dMLew5TJq693UYX
 xjLGD2ltOoFEN8e7/WXI0/ktvvSCAQalmz+sPgJvTlosUhpGXky85ced1PSrKiEV
 l0tREpmawDo9WVvC/06yBj97Op6PDdb4CovDcyLT6Yt3v1aBZT0=
 =ivN3
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

 - Fix crashes when the attr fork isn't present due to errors but inode
   inactivation tries to zap the attr data anyway.

 - Convert more directory corruption debugging asserts to actual
   EFSCORRUPTED returns instead of blowing up later on.

 - Don't fail writeback just because we ran out of memory allocating
   metadata log data.

* tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: don't crash on null attr fork xfs_bmapi_read
  xfs: remove more ondisk directory corruption asserts
  fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve().
2019-08-15 12:29:36 -07:00
Jackie Liu
a982eeb09b io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list
This patch may fix two issues:

First, when IOSQE_IO_DRAIN set, the next IOs need to be inserted into
defer list to delay execution, but link io will be actively scheduled to
run by calling io_queue_sqe.

Second, when multiple LINK_IOs are inserted together with defer_list,
the LINK_IO is no longer keep order.

   |-------------|
   |   LINK_IO   |      ----> insert to defer_list  -----------
   |-------------|                                            |
   |   LINK_IO   |      ----> insert to defer_list  ----------|
   |-------------|                                            |
   |   LINK_IO   |      ----> insert to defer_list  ----------|
   |-------------|                                            |
   |   NORMAL_IO |      ----> insert to defer_list  ----------|
   |-------------|                                            |
                                                              |
                              queue_work at same time   <-----|

Fixes: 9e645e1105 ("io_uring: add support for sqe links")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15 11:21:39 -06:00
Jens Axboe
7b6620d7db block: remove REQ_NOWAIT_INLINE
We had a few issues with this code, and there's still a problem around
how we deal with error handling for chained/split bios. For now, just
revert the code and we'll try again with a thoroug solution. This
reverts commits:

e15c2ffa10 ("block: fix O_DIRECT error handling for bio fragments")
0eb6ddfb86 ("block: Fix __blkdev_direct_IO() for bio fragments")
6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
893a1c9720 ("blk-mq: allow REQ_NOWAIT to return an error inline")

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15 11:09:16 -06:00
Aleix Roca Nonell
99c79f6692 io_uring: fix manual setup of iov_iter for fixed buffers
Commit bd11b3a391 ("io_uring: don't use iov_iter_advance() for fixed
buffers") introduced an optimization to avoid using the slow
iov_iter_advance by manually populating the iov_iter iterator in some
cases.

However, the computation of the iterator count field was erroneous: The
first bvec was always accounted for an extent of page size even if the
bvec length was smaller.

In consequence, some I/O operations on fixed buffers were unable to
operate on the full extent of the buffer, consistently skipping some
bytes at the end of it.

Fixes: bd11b3a391 ("io_uring: don't use iov_iter_advance() for fixed buffers")
Cc: stable@vger.kernel.org
Signed-off-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15 11:03:38 -06:00
Linus Torvalds
e22a97a2a8 AFS Fixes
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1UELcACgkQ+7dXa6fL
 C2viWw//eDLvElBjaQsabfpMOVGkf02t+zCzNfKSC0KdM1+GUZ1FyXQ0UAtgwICY
 Sp01h/RZa0V07sfYP7R5kL4/KIMdODmhrP0iiHDpoMjKCL7qR9tFbJDAcHtH8xz2
 52UV2dmdDBI/wdw/i5dn6M02SoYAQMl1XT49SkzhFSELVchkpraGsf1vf4yITeVe
 eI1TaOxI+TUaeH5f6+KWp6c8K8q70p3KfrR2VmCWkBrD7PNg9lp19pVnz8tdofYu
 xURHQbJulSqM+mY7pcNBOi2iWy3dCLjBTkVJIwIhZcZqLThACY38SSaPtmdhgif4
 wcyyZUtd8EGPzPPqbfCx7ycTIIDtL/r98XtGyiTJBKrCK+flZONdu0g/oIzvJ/Wu
 hV4+ButxCuMakbLOe+Hew3lhHFOy7m9XZtOURzxzZSm9uazHDMxnw4ocxIOs24F1
 qus1sG0+rlVDcMYjo2tKEAzOl/ZejJ/NUTd60ANIWKTHply2/2/5dH94B0yLwDnp
 tfifBrBkyqFB4XUKGvqvvJczl0d7+zsEScs4VQLVO/WhATjj6jNnrYKgwvBS5pCM
 890qUzj3TRW7ciZLi0THMEHBlEfbEWhNCaggAqieIvbKv7t4Kh2cUBaIsxo4IYqU
 PBZZhFXRul5ocTJrV9pScl4RbzxE5V0j9cwSiiWnzZL1sQucIgQ=
 =zivP
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs fixes from David Howells:

 - Fix the CB.ProbeUuid handler to generate its reply correctly.

 - Fix a mix up in indices when parsing a Volume Location entry record.

 - Fix a potential NULL-pointer deref when cleaning up a read request.

 - Fix the expected data version of the destination directory in
   afs_rename().

 - Fix afs_d_revalidate() to only update d_fsdata if it's not the same
   as the directory data version to reduce the likelihood of overwriting
   the result of a competing operation. (d_fsdata carries the directory
   DV or the least-significant word thereof).

 - Fix the tracking of the data-version on a directory and make sure
   that dentry objects get properly initialised, updated and
   revalidated.

   Also fix rename to update d_fsdata to match the new directory's DV if
   the dentry gets moved over and unhash the dentry to stop
   afs_d_revalidate() from interfering.

* tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix missing dentry data version updating
  afs: Only update d_fsdata if different in afs_d_revalidate()
  afs: Fix off-by-one in afs_rename() expected data version calculation
  fs: afs: Fix a possible null-pointer dereference in afs_put_read()
  afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u()
  afs: Fix the CB.ProbeUuid service handler to reply correctly
2019-08-14 14:21:14 -07:00
NeilBrown
6a2aeab59e seq_file: fix problem when seeking mid-record
If you use lseek or similar (e.g.  pread) to access a location in a
seq_file file that is within a record, rather than at a record boundary,
then the first read will return the remainder of the record, and the
second read will return the whole of that same record (instead of the
next record).  When seeking to a record boundary, the next record is
correctly returned.

This bug was introduced by a recent patch (identified below).  Before
that patch, seq_read() would increment m->index when the last of the
buffer was returned (m->count == 0).  After that patch, we rely on
->next to increment m->index after filling the buffer - but there was
one place where that didn't happen.

Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Sergei Turchanov <turchanov@farpost.com>
Tested-by: Sergei Turchanov <turchanov@farpost.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13 16:06:52 -07:00
Darrick J. Wong
8612de3f7b xfs: don't crash on null attr fork xfs_bmapi_read
Zorro Lang reported a crash in generic/475 if we try to inactivate a
corrupt inode with a NULL attr fork (stack trace shortened somewhat):

RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
FS:  00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
Call Trace:
 xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
 xfs_da_read_buf+0xf5/0x2c0 [xfs]
 xfs_da3_node_read+0x1d/0x230 [xfs]
 xfs_attr_inactive+0x3cc/0x5e0 [xfs]
 xfs_inactive+0x4c8/0x5b0 [xfs]
 xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
 destroy_inode+0xbc/0x190
 xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
 xfs_bulkstat_one+0x16/0x20 [xfs]
 xfs_bulkstat+0x6fa/0xf20 [xfs]
 xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
 xfs_file_ioctl+0xee0/0x12a0 [xfs]
 do_vfs_ioctl+0x193/0x1000
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x6f/0xb0
 do_syscall_64+0x9f/0x4d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f11d39a3e5b

The "obvious" cause is that the attr ifork is null despite the inode
claiming an attr fork having at least one extent, but it's not so
obvious why we ended up with an inode in that state.

Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-08-12 09:32:44 -07:00
Darrick J. Wong
858b44dc62 xfs: remove more ondisk directory corruption asserts
Continue our game of replacing ASSERTs for corrupt ondisk metadata with
EFSCORRUPTED returns.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-08-12 09:32:44 -07:00
Linus Torvalds
b6c0649caf dax fixes v5.3-rc4
- Fix dax_layout_busy_page() to not discard private cow pages of fs/dax
   private mappings.
 
 - Update the memremap_pages core to properly cleanup on behalf of
   internal reference-count users like device-dax.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdUGLxAAoJEB7SkWpmfYgC62IP/3aHwBbdedlXves4NQ0QhN5z
 3jooOqpayfgkdPZp4U2XnNolsbKDME6h7m7Mn+GzzRSCfNGsdcnLHDWwXt4tTWE8
 rvDNPat+22oWSVqnZOTb5GfnZKGmAbk5eC2HI9HT2VPf/BWMIoU6/QhvIzeLCEf7
 g72XTEouitGgRk2Cn8Wi3+y+fvbMdur/0qofBH9rxfQEgTWiDtCJvxZ5KyH18hlt
 qbvJR0CG2mxIxEbM1qx/1/HysXgs3UeTJVzHioF5SLdGcyQP14Djp0MCuqTGiB6l
 aEgMYSJcca7nilJNMcc2gNEsNNuga6UDWaF52FJuAoy+3vs837iexP6L+VbthqTT
 70vAvEOnDyzgKe/jll8INjzSc+RDCbMXoFdSmGXTt9KHVMbZ+taGOeEIaY+UkYKk
 g1BAWiZZEedJZXZmltGPDOXuPdzmK1uMR13gjz/FS298ffmznsqfEAKkSuzxWsKH
 vheQnQjbEQoRNk4uI/mjxz/XYCEgnVqXX/9OlQ3WrJHWdOtBLEfykAem2RqeDGc7
 QF3EZ+IGCD4xPRJWbWXtJdUK7EtGTPrzyiKHEPnXsj8xYDo6oAYK+erychiSI7y5
 9htSyCWNZ5ccE8hqewKD5qN1XNqX6XmmmxgMX4i2w9fSdWFgzAkWPw/xTvw6BkfC
 Bn5b/9LcFIfTX9/oAlwh
 =Se2X
 -----END PGP SIGNATURE-----

Merge tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax fixes from Dan Williams:
 "A filesystem-dax and device-dax fix for v5.3.

  The filesystem-dax fix is tagged for stable as the implementation has
  been mistakenly throwing away all cow pages on any truncate or hole
  punch operation as part of the solution to coordinate device-dma vs
  truncate to dax pages.

  The device-dax change fixes up a regression this cycle from the
  introduction of a common 'internal per-cpu-ref' implementation.

  Summary:

   - Fix dax_layout_busy_page() to not discard private cow pages of
     fs/dax private mappings.

   - Update the memremap_pages core to properly cleanup on behalf of
     internal reference-count users like device-dax"

* tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  mm/memremap: Fix reuse of pgmap instances with internal references
  dax: dax_layout_busy_page() should not unmap cow pages
2019-08-11 13:15:10 -07:00
Linus Torvalds
829890d266 Fix incorrect lseek / fiemap results
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdTsiNAAoJENW/n+sDE2U6NAQP/3sffDXhuA3smLw41OjPs/wu
 kFbf0GdDrxouPfCEzpqdOaETcdsb9LVnJWmWKexDNjgzj4+5NdvwjD8UdEHUUsfp
 FBl2ALYobAmyEyKlzgcJ+CA40DMrsWWe8cqwcDqukz5nuyK/XwKJAtUa36TXhZx5
 1takFgd2FSpxCbBkyS/4SboG3xuXFyuKMJ3j3iSHiZFFlasf1WijLZEE4WnLeKTU
 BJYhf3SWPHKjwJb+vHJrXEwibluK4yfcYUPJUx+unZLsIoAw1DRj4uU6DrQTGc0t
 vJov4B56TQcwWIt3ZO24GrO0bZ7/xYruucgrYfhx8C9enWsN/bDcrcePeNJJaAd6
 fv2dmZAP/x5MQZWIz5wB2Kj3MwZ51gOru4pRJylJsqi2GwBt3TXyj/fXcOf3gs/6
 JjuxDxorg3lRrv67SFYzhmPxHYKhAEG+0pj+hci+BXoBnfY71Rs2qnzmynf3z6Yg
 /n5lZwfGRv0QqSaaDtQsXcyhjMAhewJhSabWwgFIc80xgARyA3KpsbIXRI8vrnFo
 bs07P8eROaPVaL4lkVJCil9nJhh2K+XZsal5QpUGHDKaM5NIT0JG+Iho5D34vrF/
 B62+GlAoaJwiOU4A86U7lY99zk7HYFuz5chkjNpFnrki3Y6umM+i4U2faO92kAIw
 /SCEBYH7oACnN4uwKxYZ
 =nHlX
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.3-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:
 "Fix incorrect lseek / fiemap results"

* tag 'gfs2-v5.3-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: gfs2_walk_metadata fix
2019-08-10 15:41:15 -07:00
Linus Torvalds
50e73a4a41 for-linus-20190809
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1NmNIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpouEEADA1e2HH9AO8QGWBVSeFi5ZdRnEXsJUfNGD
 d576M9empGI/hkbui0yY2tuodnMwmhUj641GTRC2dzl7RRyGcTMmqtGYPcyczlpU
 +6Yil3+XVNYTXRpUsKWs32H9aubNZl/L3rCcJImGvMgLW5YtEjAZJFIFIzXWWwDJ
 aZpTtnOH1+D3HH6HT35xk+aytSYZ7LsZ7X3LI9ZumKOUd2HJZGUkfWLXxgSuuUh/
 /WaBEQ9xzDNARmfx9Qb/2wSAE7XOInupPr86fI9dnmXHZ8rwhsvHRIZEvNIIqF5Z
 KzbF+rGJZ2bizKFpVFdlrIIyfBleFlQGFzYnGrs/+47zAl/CGicqmuAhGcOdGQXd
 wyj6G66iBcBIQC9hsYhPtglyQSk95tzsPZLZa7/1TlhEKl3mpor4w30NrLz/P5oy
 gdIivDhKP7aRFzBWw/2O4TOn3HhGxnZWVni6icOHE/pBQLBW12Ulc1nT0SiJ8RAt
 PYhMCFwz0Xc7pYuEfGZKwNSCIrx4i7f+spWEAt0Fget/KaB993zm+K4OzGbfBv6r
 FrZ5g/2MTVBJITmJVN2LDysF4EVEOzTULX+bCQOOWC7wmE/poCyNWj8gCpunIZsY
 V5DAB+cbEw/OLfpdiogXvCcxrbukKHV8AVdMdLYB5yEAuPN0JMqcunMjECxpNAha
 1wsv86ljvw==
 =Au30
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190809' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Revert of a bcache patch that caused an oops for some (Coly)

 - ata rb532 unused warning fix (Gustavo)

 - AoE kernel crash fix (He)

 - Error handling fixup for blkdev_get() (Jan)

 - libata read/write translation and SFF PIO fix (me)

 - Use after free and error handling fix for O_DIRECT fragments. There's
   still a nowait + sync oddity in there, we'll nail that start next
   week. If all else fails, I'll queue a revert of the NOWAIT change.
   (me)

 - Loop GFP_KERNEL -> GFP_NOIO deadlock fix (Mikulas)

 - Two BFQ regression fixes that caused crashes (Paolo)

* tag 'for-linus-20190809' of git://git.kernel.dk/linux-block:
  bcache: Revert "bcache: use sysfs_match_string() instead of __sysfs_match_string()"
  loop: set PF_MEMALLOC_NOIO for the worker thread
  bdev: Fixup error handling in blkdev_get()
  block, bfq: handle NULL return value by bfq_init_rq()
  block, bfq: move update of waker and woken list to queue freeing
  block, bfq: reset last_completed_rq_bfqq if the pointed queue is freed
  block: aoe: Fix kernel crash due to atomic sleep when exiting
  libata: add SG safety checks in SFF pio transfers
  libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
  block: fix O_DIRECT error handling for bio fragments
  ata: rb532_cf: Fix unused variable warning in rb532_pata_driver_probe
2019-08-09 09:28:18 -07:00
Andreas Gruenbacher
a27a0c9b6a gfs2: gfs2_walk_metadata fix
It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size.  This will confuse fiemap as well as lseek with the
SEEK_DATA flag.

Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.

Fixes xfstest generic/490.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
2019-08-09 16:56:12 +01:00
Linus Torvalds
b678c568c5 NFS client bugfixes for Linux 5.3
Highlights include:
 
 Stable fixes:
 - NFSv4: Ensure we check the return value of update_open_stateid() so we
   correctly track active open state.
 - NFSv4: Fix for delegation state recovery to ensure we recover all open
   modes that are active.
 - NFSv4: Fix an Oops in nfs4_do_setattr
 
 Bugfixes:
 - NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
 - NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
 - NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
 - pNFS: Report errors from the call to nfs4_select_rw_stateid()
 - NFSv4: Various other delegation and open stateid recovery fixes
 - NFSv4: Fix state recovery behaviour when server connection times out
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl1LKhEACgkQZwvnipYK
 APIAuw/9HnKwnJYKvkAv/Pg2eBQZAgwjchc/uPsfteSPr8PMFS889rsqvDoGrAI4
 VjZRh7Jsp/FPAlLzZKCnLF/fxKE83qxgS3MP14of9IoRv2gznsW7jexy48AhU/5t
 Ae4Wgu3GJJ0IIrr8hbrkJRkBUoYUMLguCNNaZC7LDLzEVQ0wNDAVpdsZ+gdnCcrw
 zhnFnz72p2h95tfL5QkJ+OYrAu4ikdlSjx2oOdLsUGFEAnTehpUPd3DPDiCQbctx
 zPHSGukj+8tsPJ+EUVuj7ouDJqTDyMFVe1eKRJWMIq22bUAM1GBtVVMw8uFXJi5i
 9WFUJIezHhkh3Hdx82ptUmt3u1hRuSolaKDICeIR2Kob0gUArqk0KR7upgVMS3Fn
 INm/c4Zsqsa1ABevQTLWqz+nVPUPRFGmEZfjvBwkmYlkKnqbjWxXQRkROt8UJS3O
 3vfK1hUEIUyt4uI2yHusru5nIQ3pv/h1WAwpfuSQFw+nEvC6YcssECz8uOhKEnEr
 UWnUxRP66YVL4L+VAsajzAArBDQ8cUU3bv6q0x2IEWA8CHHy0BH1MGM6cumVW8YQ
 rwj0KqR4+aH0u5g8EpOkODAJfuxo10oJi6MNlr2+OSn60tdmi/P1K8eg4ERODqya
 vGqgftjTfmViYfoaWpt2NDqAbKOu4wXovb6jWqbP2lUsh+ttojg=
 =IXqc
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - NFSv4: Ensure we check the return value of update_open_stateid() so
     we correctly track active open state.

   - NFSv4: Fix for delegation state recovery to ensure we recover all
     open modes that are active.

   - NFSv4: Fix an Oops in nfs4_do_setattr

  Fixes:

   - NFS: Fix regression whereby fscache errors are appearing on 'nofsc'
     mounts

   - NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()

   - NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid

   - pNFS: Report errors from the call to nfs4_select_rw_stateid()

   - NFSv4: Various other delegation and open stateid recovery fixes

   - NFSv4: Fix state recovery behaviour when server connection times
     out"

* tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Ensure state recovery handles ETIMEDOUT correctly
  NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
  NFSv4: Fix an Oops in nfs4_do_setattr
  NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
  NFSv4: Check the return value of update_open_stateid()
  NFSv4.1: Only reap expired delegations
  NFSv4.1: Fix open stateid recovery
  NFSv4: Report the error from nfs4_select_rw_stateid()
  NFSv4: When recovering state fails with EAGAIN, retry the same recovery
  NFSv4: Print an error in the syslog when state is marked as irrecoverable
  NFSv4: Fix delegation state recovery
  NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
2019-08-08 14:47:19 -07:00
Linus Torvalds
518a1c2f09 Merge tag '5.3-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Six small SMB3 fixes, two for stable"

* tag '5.3-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
  smb3: update TODO list of missing features
  smb3: send CAP_DFS capability during session setup
  SMB3: Fix potential memory leak when processing compound chain
  SMB3: Fix deadlock in validate negotiate hits reconnect
  cifs: fix rmmod regression in cifs.ko caused by force_sig changes
2019-08-08 09:57:50 -07:00
Jan Kara
e91455bad5 bdev: Fixup error handling in blkdev_get()
Commit 89e524c04f ("loop: Fix mount(2) failure due to race with
LOOP_SET_FD") converted blkdev_get() to use the new helpers for
finishing claiming of a block device. However the conversion botched the
error handling in blkdev_get() and thus the bdev has been marked as held
even in case __blkdev_get() returned error. This led to occasional
warnings with block/001 test from blktests like:

kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0

Correct the error handling.

CC: stable@vger.kernel.org
Fixes: 89e524c04f ("loop: Fix mount(2) failure due to race with LOOP_SET_FD")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08 07:37:03 -06:00
Jens Axboe
e15c2ffa10 block: fix O_DIRECT error handling for bio fragments
0eb6ddfb86 tried to fix this up, but introduced a use-after-free
of dio. Additionally, we still had an issue with error handling,
as reported by Darrick:

"I noticed a regression in xfs/747 (an unreleased xfstest for the
xfs_scrub media scanning feature) on 5.3-rc3.  I'll condense that down
to a simpler reproducer:

error-test: 0 209 linear 8:48 0
error-test: 209 1 error
error-test: 210 6446894 linear 8:48 210

Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO
for sector 209 and to pass the io to the scsi device everywhere else.

On 5.3-rc3, performing a directio pread of this range with a < 1M buffer
(in other words, a request for fewer than MAX_BIO_PAGES bytes) yields
EIO like you'd expect:

pread64(3, 0x7f880e1c7000, 1048576, 0)  = -1 EIO (Input/output error)
pread: Input/output error
+++ exited with 0 +++

But doing it with a larger buffer succeeds(!):

pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880
read 1146880/1146880 bytes at offset 0
1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec)
+++ exited with 0 +++

(Note that the part of the buffer corresponding to the dm-error area is
uninitialized)

On 5.3-rc2, both commands would fail with EIO like you'd expect.  The
only change between rc2 and rc3 is commit 0eb6ddfb86 ("block: Fix
__blkdev_direct_IO() for bio fragments").

AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets
split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB)
and a second one for the 96k after that."

Fix this by noting that it's always safe to dereference dio if we get
BLK_QC_T_EAGAIN returned, as end_io hasn't been run for that case. So
we can safely increment the dio size before calling submit_bio(), and
then decrement it on failure (not that it really matters, as the bio
and dio are going away).

For error handling, return to the original method of just using 'ret'
for tracking the error, and the size tracking in dio->size.

Fixes: 0eb6ddfb86 ("block: Fix __blkdev_direct_IO() for bio fragments")
Fixes: 6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-07 12:19:43 -06:00
Trond Myklebust
67e7b52d44 NFSv4: Ensure state recovery handles ETIMEDOUT correctly
Ensure that the state recovery code handles ETIMEDOUT correctly,
and also that we set RPC_TASK_TIMEOUT when recovering open state.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-07 12:55:11 -04:00
Qu Wenruo
07301df7d2 btrfs: trim: Check the range passed into to prevent overflow
Normally the range->len is set to default value (U64_MAX), but when it's
not default value, we should check if the range overflows.

And if it overflows, return -EINVAL before doing anything.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-08-07 16:42:39 +02:00
Filipe Manana
d7cd4dd907 Btrfs: fix sysfs warning and missing raid sysfs directories
In the 5.3 merge window, commit 7c7e301406 ("btrfs: sysfs: Replace
default_attrs in ktypes with groups"), we started using the member
"defaults_groups" for the kobject type "btrfs_raid_ktype". That leads
to a series of warnings when running some test cases of fstests, such
as btrfs/027, btrfs/124 and btrfs/176. The traces produced by those
warnings are like the following:

  [116648.059212] kernfs: can not remove 'total_bytes', no directory
  [116648.060112] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80
  (...)
  [116648.066482] CPU: 3 PID: 28500 Comm: umount Tainted: G        W         5.3.0-rc3-btrfs-next-54 #1
  (...)
  [116648.069376] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80
  (...)
  [116648.072385] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282
  [116648.073437] RAX: 0000000000000000 RBX: ffffffffc0c11998 RCX: 0000000000000000
  [116648.074201] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8
  [116648.074956] RBP: ffffffffc0b9ca2f R08: 0000000000000000 R09: 0000000000000001
  [116648.075708] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120
  [116648.076434] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100
  [116648.077143] FS:  00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000
  [116648.077852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [116648.078546] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0
  [116648.079235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [116648.079907] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [116648.080585] Call Trace:
  [116648.081262]  remove_files+0x31/0x70
  [116648.081929]  sysfs_remove_group+0x38/0x80
  [116648.082596]  sysfs_remove_groups+0x34/0x70
  [116648.083258]  kobject_del+0x20/0x60
  [116648.083933]  btrfs_free_block_groups+0x405/0x430 [btrfs]
  [116648.084608]  close_ctree+0x19a/0x380 [btrfs]
  [116648.085278]  generic_shutdown_super+0x6c/0x110
  [116648.085951]  kill_anon_super+0xe/0x30
  [116648.086621]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [116648.087289]  deactivate_locked_super+0x3a/0x70
  [116648.087956]  cleanup_mnt+0xb4/0x160
  [116648.088620]  task_work_run+0x7e/0xc0
  [116648.089285]  exit_to_usermode_loop+0xfa/0x100
  [116648.089933]  do_syscall_64+0x1cb/0x220
  [116648.090567]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [116648.091197] RIP: 0033:0x7f9cdc073b37
  (...)
  [116648.100046] ---[ end trace 22e24db328ccadf8 ]---
  [116648.100618] ------------[ cut here ]------------
  [116648.101175] kernfs: can not remove 'used_bytes', no directory
  [116648.101731] WARNING: CPU: 3 PID: 28500 at fs/kernfs/dir.c:1504 kernfs_remove_by_name_ns+0x75/0x80
  (...)
  [116648.105649] CPU: 3 PID: 28500 Comm: umount Tainted: G        W         5.3.0-rc3-btrfs-next-54 #1
  (...)
  [116648.107461] RIP: 0010:kernfs_remove_by_name_ns+0x75/0x80
  (...)
  [116648.109336] RSP: 0018:ffffabfd0090bd08 EFLAGS: 00010282
  [116648.109979] RAX: 0000000000000000 RBX: ffffffffc0c119a0 RCX: 0000000000000000
  [116648.110625] RDX: ffff9fff603a7a00 RSI: ffff9fff603978a8 RDI: ffff9fff603978a8
  [116648.111283] RBP: ffffffffc0b9ca41 R08: 0000000000000000 R09: 0000000000000001
  [116648.111940] R10: ffff9ffe1f72e1c0 R11: 0000000000000000 R12: ffffffffc0b94120
  [116648.112603] R13: ffffffffb3d9b4e0 R14: 0000000000000000 R15: dead000000000100
  [116648.113268] FS:  00007f9cdc78a2c0(0000) GS:ffff9fff60380000(0000) knlGS:0000000000000000
  [116648.113939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [116648.114607] CR2: 00007f9fc4747ab4 CR3: 00000005c7832003 CR4: 00000000003606e0
  [116648.115286] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [116648.115966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [116648.116649] Call Trace:
  [116648.117326]  remove_files+0x31/0x70
  [116648.117997]  sysfs_remove_group+0x38/0x80
  [116648.118671]  sysfs_remove_groups+0x34/0x70
  [116648.119342]  kobject_del+0x20/0x60
  [116648.120022]  btrfs_free_block_groups+0x405/0x430 [btrfs]
  [116648.120707]  close_ctree+0x19a/0x380 [btrfs]
  [116648.121396]  generic_shutdown_super+0x6c/0x110
  [116648.122057]  kill_anon_super+0xe/0x30
  [116648.122702]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [116648.123335]  deactivate_locked_super+0x3a/0x70
  [116648.123961]  cleanup_mnt+0xb4/0x160
  [116648.124586]  task_work_run+0x7e/0xc0
  [116648.125210]  exit_to_usermode_loop+0xfa/0x100
  [116648.125830]  do_syscall_64+0x1cb/0x220
  [116648.126463]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [116648.127080] RIP: 0033:0x7f9cdc073b37
  (...)
  [116648.135923] ---[ end trace 22e24db328ccadf9 ]---

These happen because, during the unmount path, we call kobject_del() for
raid kobjects that are not fully initialized, meaning that we set their
ktype (as btrfs_raid_ktype) through link_block_group() but we didn't set
their parent kobject, which is done through btrfs_add_raid_kobjects().

We have this split raid kobject setup since commit 75cb379d26
("btrfs: defer adding raid type kobject until after chunk relocation") in
order to avoid triggering reclaim during contextes where we can not
(either we are holding a transaction handle or some lock required by
the transaction commit path), so that we do the calls to kobject_add(),
which triggers GFP_KERNEL allocations, through btrfs_add_raid_kobjects()
in contextes where it is safe to trigger reclaim. That change expected
that a new raid kobject can only be created either when mounting the
filesystem or after raid profile conversion through the relocation path.
However, we can have new raid kobject created in other two cases at least:

1) During device replace (or scrub) after adding a device a to the
   filesystem. The replace procedure (and scrub) do calls to
   btrfs_inc_block_group_ro() which can allocate a new block group
   with a new raid profile (because we now have more devices). This
   can be triggered by test cases btrfs/027 and btrfs/176.

2) During a degraded mount trough any write path. This can be triggered
   by test case btrfs/124.

Fixing this by adding extra calls to btrfs_add_raid_kobjects(), not only
makes things more complex and fragile, can also introduce deadlocks with
reclaim the following way:

1) Calling btrfs_add_raid_kobjects() at btrfs_inc_block_group_ro() or
   anywhere in the replace/scrub path will cause a deadlock with reclaim
   because if reclaim happens and a transaction commit is triggered,
   the transaction commit path will block at btrfs_scrub_pause().

2) During degraded mounts it is essentially impossible to figure out where
   to add extra calls to btrfs_add_raid_kobjects(), because allocation of
   a block group with a new raid profile can happen anywhere, which means
   we can't safely figure out which contextes are safe for reclaim, as
   we can either hold a transaction handle or some lock needed by the
   transaction commit path.

So it is too complex and error prone to have this split setup of raid
kobjects. So fix the issue by consolidating the setup of the kobjects in a
single place, at link_block_group(), and setup a nofs context there in
order to prevent reclaim being triggered by the memory allocations done
through the call chain of kobject_add().

Besides fixing the sysfs warnings during kobject_del(), this also ensures
the sysfs directories for the new raid profiles end up created and visible
to users (a bug that existed before the 5.3 commit 7c7e301406
("btrfs: sysfs: Replace default_attrs in ktypes with groups")).

Fixes: 75cb379d26 ("btrfs: defer adding raid type kobject until after chunk relocation")
Fixes: 7c7e301406 ("btrfs: sysfs: Replace default_attrs in ktypes with groups")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-08-07 16:25:44 +02:00