Commit graph

54685 commits

Author SHA1 Message Date
Brian Foster
0bd6207f83 xfs: remove dfops param in attr fork add path
Now that the attribute fork add tx carries dfops along with the
transaction, it is unnecessary to pass it down the stack. Remove the
dfops parameter and access ->t_dfops directly where necessary. This
patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:10 -07:00
Brian Foster
40d03ac6aa xfs: use ->t_dfops for attr set/remove operations
Attach the local dfops to the transaction allocated for xattr add
and remove operations. Add an earlier initialization in
xfs_attr_remove() to ensure the structure is valid if it remains
unused at transaction commit time.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:09 -07:00
Brian Foster
813d08cb6d xfs: use ->t_dfops for recovery of [b|c]ui log items
Log recovery passes down a central dfops structure to recovery
handlers for bui and cui log items. Each of these handlers allocates
and commits a transaction and defers any remaining operations to be
completed by the main recovery sequence.

Since dfops outlives the transaction in this context, set and clear
->t_dfops appropriately such that the *_finish_item() paths and
below (i.e., xfs_bmapi*()) can expect to find the dfops in the
transaction without it being committed with the dfops attached. This
is required because transaction commit expects that an associated
dfops is finished and in this context the dfops may be populated at
commit time.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:09 -07:00
Brian Foster
c9cfdb3811 xfs: remove dfops param from high level dirname calls
All callers of the directory create, rename and remove interfaces
already associate the dfops with the transaction. Drop the dfops
parameters in these calls in preparation for further cleanups in the
layers below. This patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:08 -07:00
Brian Foster
0e0417f3e5 xfs: remove dfops parameter from ifree call stack
The inode free callchain starting in xfs_inactive_ifree() already
associates its dfops with the transaction. It still passes the dfops
on the stack down through xfs_difree_inobt(), however.

Clean up the call stack and reference dfops directly from the
transaction. This patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:07 -07:00
Brian Foster
6aa6718439 xfs: rename xfs_trans ->t_agfl_dfops to ->t_dfops
The ->t_agfl_dfops field is currently used to defer agfl block frees
from associated transaction contexts. While all known problematic
contexts have already been updated to use ->t_agfl_dfops, the
broader goal is defer agfl frees from all callers that already use a
deferred operations structure. Further, the transaction field
facilitates a good amount of code clean up where the transaction and
dfops have historically been passed down through the stack
separately.

Rename the field to something more generic to prepare to use it as
such throughout XFS. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:07 -07:00
Brian Foster
8a74938649 xfs: cow unwritten conversion uses uninitialized dfops
A couple COW fork unwritten extent conversion helpers pass an
uninitialized dfops pointer to xfs_bmapi_write(). This does not
cause problems because conversion does not use a transaction or the
dfops structure for the COW fork.  Drop the uninitialized usage of
dfops in these codepaths and pass NULL along to xfs_bmapi_write()
instead.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:06 -07:00
Christoph Hellwig
98c1a7c0ec xfs: update my copyrights for the writeback and iomap code
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:06 -07:00
Christoph Hellwig
82cb14175e xfs: add support for sub-pagesize writeback without buffer_heads
Switch to using the iomap_page structure for checking sub-page uptodate
status and track sub-page I/O completion status, and remove large
quantities of boilerplate code working around buffer heads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:05 -07:00
Christoph Hellwig
9dc55f1389 iomap: add support for sub-pagesize buffered I/O without buffer heads
After already supporting a simple implementation of buffered writes for
the blocksize == PAGE_SIZE case in the last commit this adds full support
even for smaller block sizes.   There are three bits of per-block
information in the buffer_head structure that really matter for the iomap
read and write path:

 - uptodate status (BH_uptodate)
 - marked as currently under read I/O (BH_Async_Read)
 - marked as currently under write I/O (BH_Async_Write)

Instead of having new per-block structures this now adds a per-page
structure called struct iomap_page to track this information in a slightly
different form:

 - a bitmap for the per-block uptodate status.  For worst case of a 64k
   page size system this bitmap needs to contain 128 bits.  For the
   typical 4k page size case it only needs 8 bits, although we still
   need a full unsigned long due to the way the atomic bitmap API works.
 - two atomic_t counters are used to track the outstanding read and write
   counts

There is quite a bit of boilerplate code as the buffered I/O path uses
various helper methods, but the actual code is very straight forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:05 -07:00
Christoph Hellwig
ac8ee54669 xfs: allow writeback on pages without buffer heads
Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size
equal to the page size, and deal with pages without buffer heads in
writeback.  Thanks to the previous refactoring this is basically trivial
now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:04 -07:00
Christoph Hellwig
8e1f065bea xfs: refactor the tail of xfs_writepage_map
Rejuggle how we deal with the different error vs non-error and have
ioends vs not have ioend cases to keep the fast path streamlined, and
the duplicate code at a minimum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:04 -07:00
Christoph Hellwig
1b65d3dd2d xfs: remove xfs_start_page_writeback
This helper only has two callers, one of them with a constant error
argument.  Remove it to make pending changes to the code a little easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:03 -07:00
Christoph Hellwig
6d465e8953 xfs: move all writeback buffer_head manipulation into xfs_map_at_offset
This keeps it in a single place so it can be made otional more easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:03 -07:00
Christoph Hellwig
3faed66764 xfs: don't look at buffer heads in xfs_add_to_ioend
Calculate all information for the bio based on the passed in information
without requiring a buffer_head structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:02 -07:00
Christoph Hellwig
889c65b3f6 xfs: remove the imap_valid flag
Simplify the way we check for a valid imap - we know we have a valid
mapping after xfs_map_blocks returned successfully, and we know we can
call xfs_imap_valid on any imap, as it will always fail on a
zero-initialized map.

We can also remove the xfs_imap_valid function and fold it into
xfs_map_blocks now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:02 -07:00
Christoph Hellwig
3345746ef3 xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly
xfs_bmapi_read adds zero value in xfs_map_blocks.  Replace it with a
direct call to the low-level extent lookup function.

Note that we now always pass a 0 length to the trace points as we ask
for an unspecified len.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:02 -07:00
Christoph Hellwig
060d4eaa0b xfs: remove xfs_reflink_find_cow_mapping
We only have one caller left, and open coding the simple extent list
lookup in it allows us to make the code both more understandable and
reuse calculations and variables already present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:01 -07:00
Christoph Hellwig
c3a2f9fff1 xfs: remove the now unused XFS_BMAPI_IGSTATE flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:01 -07:00
Dave Chinner
e2f6ad4624 xfs: make xfs_writepage_map extent map centric
xfs_writepage_map() iterates over the bufferheads on a page to decide
what sort of IO to do and what actions to take.  However, when it comes
to reflink and deciding when it needs to execute a COW operation, we no
longer look at the bufferhead state but instead we ignore than and look
up internal state held in the COW fork extent list.

This means xfs_writepage_map() is somewhat confused. It does stuff, then
ignores it, then tries to handle the impedence mismatch by shovelling the
results inside the existing mapping code.  It works, but it's a bit of a
mess and it makes it hard to fix the cached map bug that the writepage
code currently has.

To unify the two different mechanisms, we first have to choose a direction.
That's already been set - we're de-emphasising bufferheads so they are no
longer a control structure as we need to do taht to allow for eventual
removal.  Hence we need to move away from looking at bufferhead state to
determine what operations we need to perform.

We can't completely get rid of bufferheads yet - they do contain some
state that is absolutely necessary, such as whether that part of the page
contains valid data or not (buffer_uptodate()).  Other state in the
bufferhead is redundant:

	BH_dirty - the page is dirty, so we can ignore this and just
		write it
	BH_delay - we have delalloc extent info in the DATA fork extent
		tree
	BH_unwritten - same as BH_delay
	BH_mapped - indicates we've already used it once for IO and it is
		mapped to a disk address. Needs to be ignored for COW
		blocks.

The BH_mapped flag is an interesting case - it's supposed to indicate that
it's already mapped to disk and so we can just use it "as is".  In theory,
we don't even have to do an extent lookup to find where to write it too,
but we have to do that anyway to determine we are actually writing over a
valid extent.  Hence it's not even serving the purpose of avoiding a an
extent lookup during writeback, and so we can pretty much ignore it.
Especially as we have to ignore it for COW operations...

Therefore, use the extent map as the source of information to tell us
what actions we need to take and what sort of IO we should perform.  The
first step is to have xfs_map_blocks() set the io type according to what
it looks up.  This means it can easily handle both normal overwrite and
COW cases.  The only thing we also need to add is the ability to return
hole mappings.

We need to return and cache hole mappings now for the case of multiple
blocks per page.  We no longer use the BH_mapped to indicate a block over
a hole, so we have to get that info from xfs_map_blocks().  We cache it so
that holes that span two pages don't need separate lookups.  This allows us
to avoid ever doing write IO over a hole, too.

Now that we have xfs_map_blocks() returning both a cached map and the type
of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
the bufferhead control. It's also much simplified because it doesn't need
to explicitly handle COW operations.  Instead of iterating bufferheads, it
iterates blocks within the page and then looks up what per-block state is
required from the appropriate bufferhead.  It then validates the cached
map, and if it's not valid, we get a new map.  If we don't get a valid map
or it's over a hole, we skip the block.

At this point, we have to remap the bufferhead via xfs_map_at_offset().
As previously noted, we had to do this even if the buffer was already
mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
and XFS_IO_COW IO types.  With xfs_map_blocks() now controlling the type,
even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
Bufferheads that span such regions still need their BH_Delay flags cleared
and their block numbers calculated, so we now unconditionally map each
bufferhead before submission.

But wait! There's more - remember the old "treat unwritten extents as
holes on read" hack?  Yeah, that means we can have a dirty page with
unmapped, unwritten bufferheads that contain data!  What makes these so
special is that the unwritten "hole" bufferheads do not have a valid block
device pointer, so if we attempt to write them xfs_add_to_ioend() blows
up. So we make xfs_map_at_offset() do the "realtime or data device"
lookup from the inode and ignore what was or wasn't put into the
bufferhead when the buffer was instantiated.

The astute reader will have realised by now that this code treats
unwritten extents in multiple-blocks-per-page situations differently.
If we get any combination of unwritten blocks on a dirty page that contain
valid data in the page, we're going to convert them to real extents.  This
can actually be a win, because it means that pages with interleaving
unwritten and written blocks will get converted to a single written extent
with zeros replacing the interspersed unwritten blocks.  This is actually
good for reducing extent list and conversion overhead, and it means we
issue a contiguous IO instead of lots of little ones.  The downside is
that we use up a little extra IO bandwidth.  Neither of these seem like a
bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
IOs will result in better performance on them...

As a result of all this, the only state we actually care about from the
bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
pass some information to the bio via xfs_add_to_ioend(), but that is
trivial to separate and pass explicitly.  This means we really only need
1 bit of state per block per page from the buffered write path in the
writeback path.  Everything else we do with the bufferhead is purely to
make the buffered IO front end continue to work correctly. i.e we've
pretty much marginalised bufferheads in the writeback path completely.

Signed-off-By: Dave Chinner <dchinner@redhat.com>
[hch: forward port, refactor and split off bits into other commits]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:00 -07:00
Christoph Hellwig
6a4c950136 xfs: rename the offset variable in xfs_writepage_map
Calling it file_offset makes the usage more clear, especially with
a new poffset variable that will be added soon for the offset inside
the page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:26:00 -07:00
Christoph Hellwig
5c665e5b5a xfs: remove xfs_map_cow
We can handle the existing cow mapping case as a special case directly
in xfs_writepage_map, and share code for allocating delalloc blocks
with regular I/O in xfs_map_blocks.  This means we need to always
call xfs_map_blocks for reflink inodes, but we can still skip most of
the work if it turns out that there is no COW mapping overlapping the
current block.

As a subtle detail we need to start caching holes in the wpc to deal
with the case of COW reservations between EOF.  But we'll need that
infrastructure later anyway, so this is no big deal.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:59 -07:00
Christoph Hellwig
fca8c80542 xfs: remove xfs_reflink_trim_irec_to_next_cow
We already have to check for overlapping COW extents everytime we
come back to a page in xfs_writepage_map / xfs_map_cow, so this
additional trim is not required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:59 -07:00
Christoph Hellwig
a7b28f72ab xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks
We want to be able to use the extent state as a reliably indicator for
the type of I/O, and stop using the buffer head state.  For this we
need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged
extents of different types.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:59 -07:00
Christoph Hellwig
c57371a16d xfs: don't clear imap_valid for a non-uptodate buffers
Finding a buffer that isn't uptodate doesn't invalidate the mapping for
any given block.  The last_sector check will already take care of starting
another ioend as soon as we find any non-update buffer, and if the current
mapping doesn't include the next uptodate buffer the xfs_imap_valid check
will take care of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:58 -07:00
Christoph Hellwig
91cdfd1761 xfs: do not set the page uptodate in xfs_writepage_map
We already track the page uptodate status based on the buffer uptodate
status, which is updated whenever reading or zeroing blocks.

This code has been there since commit a ptool commit in 2002, which
claims to:

    "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
    major changes to actually fit.

and isn't present in other writepage implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:58 -07:00
Christoph Hellwig
d438017757 xfs: move locking into xfs_bmap_punch_delalloc_range
Both callers want the same looking, so do it only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:57 -07:00
Christoph Hellwig
0362572138 xfs: simplify xfs_aops_discard_page
Instead of looking at the buffer heads to see if a block is delalloc just
call xfs_bmap_punch_delalloc_range on the whole page - this will leave
any non-delalloc block intact and handle the iteration for us.  As a side
effect one more place stops caring about buffer heads and we can remove the
xfs_check_page_type function entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:57 -07:00
Christoph Hellwig
8b2e77c163 xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages
For file systems with a block size that equals the page size we never do
partial reads, so we can use the buffer_head-less iomap versions of
readpage and readpages without conflicting with the buffer_head structures
create later in write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11 22:25:56 -07:00
Darrick J. Wong
c2efdfc100 Merge branch 'iomap-4.19-merge' into xfs-4.19-merge 2018-07-11 22:24:40 -07:00
Linus Torvalds
70a2dc6abc Bug fixes for ext4; most of which relate to vulnerabilities where a
maliciously crafted file system image can result in a kernel OOPS or
 hang.  At least one fix addresses an inline data bug could be
 triggered by userspace without the need of a crafted file system
 (although it does require that the inline data feature be enabled).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAltBmcYACgkQ8vlZVpUN
 gaPDJgf/cEa9QuiYTbNOmcOMorK9LEk5XO8qsiJdUVNQtLsHZfl0QowbkF9/F/W5
 andTJzNpFvXeLADMTTjpsDnQ90i8LKD11Kol3dPJcMhJhELtQsjxUBguxpQBP86R
 dvHuCl2/AaqX7rr6Co80yYSinRCquqkzJNhdM5/MLNGziSpkQL3dPSs93rmV+YbU
 8DkUwmhDhoiToLBTLaldrAsAzKvor3uyjNPJ3qhxeE2kXrnuI1V4XfstBGjhVKFB
 /5aYWexDZkL5qiCo+lZnqdITqUnPx3uAkUdBn0dj7V+nDow+/R/8nApvlvJu6usF
 OfMoKr098/pmPAjE5aZ8QpBNVtLFpg==
 =njzR
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Bug fixes for ext4; most of which relate to vulnerabilities where a
  maliciously crafted file system image can result in a kernel OOPS or
  hang.

  At least one fix addresses an inline data bug could be triggered by
  userspace without the need of a crafted file system (although it does
  require that the inline data feature be enabled)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: check superblock mapped prior to committing
  ext4: add more mount time checks of the superblock
  ext4: add more inode number paranoia checks
  ext4: avoid running out of journal credits when appending to an inline file
  jbd2: don't mark block as modified if the handle is out of credits
  ext4: never move the system.data xattr out of the inode body
  ext4: clear i_data in ext4_inode_info when removing inline data
  ext4: include the illegal physical block in the bad map ext4_error msg
  ext4: verify the depth of extent tree in ext4_find_extent()
  ext4: only look at the bg_flags field if it is valid
  ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
  ext4: always check block group bounds in ext4_init_block_bitmap()
  ext4: always verify the magic number in xattr blocks
  ext4: add corruption check in ext4_xattr_set_entry()
  ext4: add warn_on_error mount option
2018-07-08 11:10:30 -07:00
Linus Torvalds
b2d44d145d five smb3/cifs fixes for stable (including for some leaks and memory overwrites) and also a few fixes for recent regressions in packet signing
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAltBFyMACgkQiiy9cAdy
 T1EwcAwAoflntkPJtDX1/Ch3fm4cwR/GHiOHJ3jXUUs5x1JVy2YMyIpojijcDB9q
 ifmc9ZEVdov5kJVJF4dz4HUhxDwPbZTgZdAwSaYUdbQepA0Nzu7k7ZaTfzWwzYTa
 AaGxNShfEWogBdhMjNPKHpIUfrnOEtosv6iLLN3iwkbypLH0f3z1Tye38+9wnDO/
 B0M64lf4gxMB7ZsjFoQIu9ZLZMlQgF9ISycPUUmahR6G9sTJaykfyTihTwOo8HUb
 zNA6hgW5lUxCpCc2eNwy2wFuLqwf3+t3JmWUgJoYqVCbscywtTScivZyNEO36/17
 4oFCExMuJ79TXBP9RyTFrYkNhsTTdAyfDOLWcsMVsAo+zHub1nqjm8ENlmGJ7ZAS
 ESdLY+E+59Hndb21Te1IVq7HZsmXKHU6UHxknXTaXFPlBIKeHbH7vtt5zUzq7lxW
 hDwPTmev+b7jOE/4+cR5WQItMxzZ+pW7Toc6f8gmN1IU2FJjEsTgNGy2n4Az5WyR
 pZAydSRd
 =x5ij
 -----END PGP SIGNATURE-----

Merge tag '4.18-rc3-smb3fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Five smb3/cifs fixes for stable (including for some leaks and memory
  overwrites) and also a few fixes for recent regressions in packet
  signing.

  Additional testing at the recent SMB3 test event, and some good work
  by Paulo and others spotted the issues fixed here. In addition to my
  xfstest runs on these, Aurelien and Stefano did additional test runs
  to verify this set"

* tag '4.18-rc3-smb3fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf()
  cifs: Fix infinite loop when using hard mount option
  cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting
  cifs: Fix memory leak in smb2_set_ea()
  cifs: fix SMB1 breakage
  cifs: Fix validation of signed data in smb2
  cifs: Fix validation of signed data in smb3+
  cifs: Fix use after free of a mid_q_entry
2018-07-07 18:31:34 -07:00
Linus Torvalds
0fa3ecd878 Fix up non-directory creation in SGID directories
sgid directories have special semantics, making newly created files in
the directory belong to the group of the directory, and newly created
subdirectories will also become sgid.  This is historically used for
group-shared directories.

But group directories writable by non-group members should not imply
that such non-group members can magically join the group, so make sure
to clear the sgid bit on non-directories for non-members (but remember
that sgid without group execute means "mandatory locking", just to
confuse things even more).

Reported-by: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-05 12:36:36 -07:00
Stefano Brivio
729c0c9dd5 cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf()
smb{2,3}_create_lease_buf() store a lease key in the lease
context for later usage on a lease break.

In most paths, the key is currently sourced from data that
happens to be on the stack near local variables for oplock in
SMB2_open() callers, e.g. from open_shroot(), whereas
smb2_open_file() properly allocates space on its stack for it.

The address of those local variables holding the oplock is then
passed to create_lease_buf handlers via SMB2_open(), and 16
bytes near oplock are used. This causes a stack out-of-bounds
access as reported by KASAN on SMB2.1 and SMB3 mounts (first
out-of-bounds access is shown here):

[  111.528823] BUG: KASAN: stack-out-of-bounds in smb3_create_lease_buf+0x399/0x3b0 [cifs]
[  111.530815] Read of size 8 at addr ffff88010829f249 by task mount.cifs/985
[  111.532838] CPU: 3 PID: 985 Comm: mount.cifs Not tainted 4.18.0-rc3+ #91
[  111.534656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  111.536838] Call Trace:
[  111.537528]  dump_stack+0xc2/0x16b
[  111.540890]  print_address_description+0x6a/0x270
[  111.542185]  kasan_report+0x258/0x380
[  111.544701]  smb3_create_lease_buf+0x399/0x3b0 [cifs]
[  111.546134]  SMB2_open+0x1ef8/0x4b70 [cifs]
[  111.575883]  open_shroot+0x339/0x550 [cifs]
[  111.591969]  smb3_qfs_tcon+0x32c/0x1e60 [cifs]
[  111.617405]  cifs_mount+0x4f3/0x2fc0 [cifs]
[  111.674332]  cifs_smb3_do_mount+0x263/0xf10 [cifs]
[  111.677915]  mount_fs+0x55/0x2b0
[  111.679504]  vfs_kern_mount.part.22+0xaa/0x430
[  111.684511]  do_mount+0xc40/0x2660
[  111.698301]  ksys_mount+0x80/0xd0
[  111.701541]  do_syscall_64+0x14e/0x4b0
[  111.711807]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  111.713665] RIP: 0033:0x7f372385b5fa
[  111.715311] Code: 48 8b 0d 99 78 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 66 78 2c 00 f7 d8 64 89 01 48
[  111.720330] RSP: 002b:00007ffff27049d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[  111.722601] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f372385b5fa
[  111.724842] RDX: 000055c2ecdc73b2 RSI: 000055c2ecdc73f9 RDI: 00007ffff270580f
[  111.727083] RBP: 00007ffff2705804 R08: 000055c2ee976060 R09: 0000000000001000
[  111.729319] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f3723f4d000
[  111.731615] R13: 000055c2ee976060 R14: 00007f3723f4f90f R15: 0000000000000000

[  111.735448] The buggy address belongs to the page:
[  111.737420] page:ffffea000420a7c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[  111.739890] flags: 0x17ffffc0000000()
[  111.741750] raw: 0017ffffc0000000 0000000000000000 dead000000000200 0000000000000000
[  111.744216] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[  111.746679] page dumped because: kasan: bad access detected

[  111.750482] Memory state around the buggy address:
[  111.752562]  ffff88010829f100: 00 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
[  111.754991]  ffff88010829f180: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
[  111.757401] >ffff88010829f200: 00 00 00 00 00 f1 f1 f1 f1 01 f2 f2 f2 f2 f2 f2
[  111.759801]                                               ^
[  111.762034]  ffff88010829f280: f2 02 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
[  111.764486]  ffff88010829f300: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  111.766913] ==================================================================

Lease keys are however already generated and stored in fid data
on open and create paths: pass them down to the lease context
creation handlers and use them.

Suggested-by: Aurélien Aptel <aaptel@suse.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Fixes: b8c32dbb0d ("CIFS: Request SMB2.1 leases")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:25 -05:00
Paulo Alcantara
7ffbe65578 cifs: Fix infinite loop when using hard mount option
For every request we send, whether it is SMB1 or SMB2+, we attempt to
reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying
out the request.

So, while server->tcpStatus != CifsNeedReconnect, we wait for the
reconnection to succeed on wait_event_interruptible_timeout(). If it
returns, that means that either the condition was evaluated to true, or
timeout elapsed, or it was interrupted by a signal.

Since we're not handling the case where the process woke up due to a
received signal (-ERESTARTSYS), the next call to
wait_event_interruptible_timeout() will _always_ fail and we end up
looping forever inside either cifs_reconnect_tcon() or smb2_reconnect().

Here's an example of how to trigger that:

$ mount.cifs //foo/share /mnt/test -o
username=foo,password=foo,vers=1.0,hard

(break connection to server before executing bellow cmd)
$ stat -f /mnt/test & sleep 140
[1] 2511

$ ps -aux -q 2511
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      2511  0.0  0.0  12892  1008 pts/0    S    12:24   0:00 stat -f
/mnt/test

$ kill -9 2511

(wait for a while; process is stuck in the kernel)
$ ps -aux -q 2511
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      2511 83.2  0.0  12892  1008 pts/0    R    12:24  30:01 stat -f
/mnt/test

By using 'hard' mount point means that cifs.ko will keep retrying
indefinitely, however we must allow the process to be killed otherwise
it would hang the system.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:25 -05:00
Stefano Brivio
f46ecbd97f cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting
A "small" CIFS buffer is not big enough in general to hold a
setacl request for SMB2, and we end up overflowing the buffer in
send_set_info(). For instance:

 # mount.cifs //127.0.0.1/test /mnt/test -o username=test,password=test,nounix,cifsacl
 # touch /mnt/test/acltest
 # getcifsacl /mnt/test/acltest
 REVISION:0x1
 CONTROL:0x9004
 OWNER:S-1-5-21-2926364953-924364008-418108241-1000
 GROUP:S-1-22-2-1001
 ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff
 ACL:S-1-22-2-1001:ALLOWED/0x0/R
 ACL:S-1-22-2-1001:ALLOWED/0x0/R
 ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff
 ACL:S-1-1-0:ALLOWED/0x0/R
 # setcifsacl -a "ACL:S-1-22-2-1004:ALLOWED/0x0/R" /mnt/test/acltest

this setacl will cause the following KASAN splat:

[  330.777927] BUG: KASAN: slab-out-of-bounds in send_set_info+0x4dd/0xc20 [cifs]
[  330.779696] Write of size 696 at addr ffff88010d5e2860 by task setcifsacl/1012

[  330.781882] CPU: 1 PID: 1012 Comm: setcifsacl Not tainted 4.18.0-rc2+ #2
[  330.783140] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  330.784395] Call Trace:
[  330.784789]  dump_stack+0xc2/0x16b
[  330.786777]  print_address_description+0x6a/0x270
[  330.787520]  kasan_report+0x258/0x380
[  330.788845]  memcpy+0x34/0x50
[  330.789369]  send_set_info+0x4dd/0xc20 [cifs]
[  330.799511]  SMB2_set_acl+0x76/0xa0 [cifs]
[  330.801395]  set_smb2_acl+0x7ac/0xf30 [cifs]
[  330.830888]  cifs_xattr_set+0x963/0xe40 [cifs]
[  330.840367]  __vfs_setxattr+0x84/0xb0
[  330.842060]  __vfs_setxattr_noperm+0xe6/0x370
[  330.843848]  vfs_setxattr+0xc2/0xd0
[  330.845519]  setxattr+0x258/0x320
[  330.859211]  path_setxattr+0x15b/0x1b0
[  330.864392]  __x64_sys_setxattr+0xc0/0x160
[  330.866133]  do_syscall_64+0x14e/0x4b0
[  330.876631]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  330.878503] RIP: 0033:0x7ff2e507db0a
[  330.880151] Code: 48 8b 0d 89 93 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 bc 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 93 2c 00 f7 d8 64 89 01 48
[  330.885358] RSP: 002b:00007ffdc4903c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
[  330.887733] RAX: ffffffffffffffda RBX: 000055d1170de140 RCX: 00007ff2e507db0a
[  330.890067] RDX: 000055d1170de7d0 RSI: 000055d115b39184 RDI: 00007ffdc4904818
[  330.892410] RBP: 0000000000000001 R08: 0000000000000000 R09: 000055d1170de7e4
[  330.894785] R10: 00000000000002b8 R11: 0000000000000246 R12: 0000000000000007
[  330.897148] R13: 000055d1170de0c0 R14: 0000000000000008 R15: 000055d1170de550

[  330.901057] Allocated by task 1012:
[  330.902888]  kasan_kmalloc+0xa0/0xd0
[  330.904714]  kmem_cache_alloc+0xc8/0x1d0
[  330.906615]  mempool_alloc+0x11e/0x380
[  330.908496]  cifs_small_buf_get+0x35/0x60 [cifs]
[  330.910510]  smb2_plain_req_init+0x4a/0xd60 [cifs]
[  330.912551]  send_set_info+0x198/0xc20 [cifs]
[  330.914535]  SMB2_set_acl+0x76/0xa0 [cifs]
[  330.916465]  set_smb2_acl+0x7ac/0xf30 [cifs]
[  330.918453]  cifs_xattr_set+0x963/0xe40 [cifs]
[  330.920426]  __vfs_setxattr+0x84/0xb0
[  330.922284]  __vfs_setxattr_noperm+0xe6/0x370
[  330.924213]  vfs_setxattr+0xc2/0xd0
[  330.926008]  setxattr+0x258/0x320
[  330.927762]  path_setxattr+0x15b/0x1b0
[  330.929592]  __x64_sys_setxattr+0xc0/0x160
[  330.931459]  do_syscall_64+0x14e/0x4b0
[  330.933314]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  330.936843] Freed by task 0:
[  330.938588] (stack is not available)

[  330.941886] The buggy address belongs to the object at ffff88010d5e2800
 which belongs to the cache cifs_small_rq of size 448
[  330.946362] The buggy address is located 96 bytes inside of
 448-byte region [ffff88010d5e2800, ffff88010d5e29c0)
[  330.950722] The buggy address belongs to the page:
[  330.952789] page:ffffea0004357880 count:1 mapcount:0 mapping:ffff880108fdca80 index:0x0 compound_mapcount: 0
[  330.955665] flags: 0x17ffffc0008100(slab|head)
[  330.957760] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880108fdca80
[  330.960356] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
[  330.963005] page dumped because: kasan: bad access detected

[  330.967039] Memory state around the buggy address:
[  330.969255]  ffff88010d5e2880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  330.971833]  ffff88010d5e2900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  330.974397] >ffff88010d5e2980: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  330.976956]                                            ^
[  330.979226]  ffff88010d5e2a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  330.981755]  ffff88010d5e2a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  330.984225] ==================================================================

Fix this by allocating a regular CIFS buffer in
smb2_plain_req_init() if the request command is SMB2_SET_INFO.

Reported-by: Jianhong Yin <jiyin@redhat.com>
Fixes: 366ed846df ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function")
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-and-tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:25 -05:00
Paulo Alcantara
6aa0c114ec cifs: Fix memory leak in smb2_set_ea()
This patch fixes a memory leak when doing a setxattr(2) in SMB2+.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-07-05 13:48:24 -05:00
Ronnie Sahlberg
81f39f951b cifs: fix SMB1 breakage
SMB1 mounting broke in commit 35e2cc1ba7
("cifs: Use correct packet length in SMB2_TRANSFORM header")
Fix it and also rename smb2_rqst_len to smb_rqst_len
to make it less unobvious that the function is also called from
CIFS/SMB1

Good job by Paulo reviewing and cleaning up Ronnie's original patch.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:24 -05:00
Paulo Alcantara
8de8c4608f cifs: Fix validation of signed data in smb2
Fixes: c713c8770f ("cifs: push rfc1002 generation down the stack")

We failed to validate signed data returned by the server because
__cifs_calc_signature() now expects to sign the actual data in iov but
we were also passing down the rfc1002 length.

Fix smb3_calc_signature() to calculate signature of rfc1002 length prior
to passing only the actual data iov[1-N] to __cifs_calc_signature(). In
addition, there are a few cases where no rfc1002 length is passed so we
make sure there's one (iov_len == 4).

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:24 -05:00
Paulo Alcantara
27c32b49c3 cifs: Fix validation of signed data in smb3+
Fixes: c713c8770f ("cifs: push rfc1002 generation down the stack")

We failed to validate signed data returned by the server because
__cifs_calc_signature() now expects to sign the actual data in iov but
we were also passing down the rfc1002 length.

Fix smb3_calc_signature() to calculate signature of rfc1002 length prior
to passing only the actual data iov[1-N] to __cifs_calc_signature(). In
addition, there are a few cases where no rfc1002 length is passed so we
make sure there's one (iov_len == 4).

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:24 -05:00
Lars Persson
696e420bb2 cifs: Fix use after free of a mid_q_entry
With protocol version 2.0 mounts we have seen crashes with corrupt mid
entries. Either the server->pending_mid_q list becomes corrupt with a
cyclic reference in one element or a mid object fetched by the
demultiplexer thread becomes overwritten during use.

Code review identified a race between the demultiplexer thread and the
request issuing thread. The demultiplexer thread seems to be written
with the assumption that it is the sole user of the mid object until
it calls the mid callback which either wakes the issuer task or
deletes the mid.

This assumption is not true because the issuer task can be woken up
earlier by a signal. If the demultiplexer thread has proceeded as far
as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
thread will happily end up calling cifs_delete_mid while the
demultiplexer thread still is using the mid object.

Inserting a delay in the cifs demultiplexer thread widens the race
window and makes reproduction of the race very easy:

		if (server->large_buf)
			buf = server->bigbuf;

+		usleep_range(500, 4000);

		server->lstrp = jiffies;

To resolve this I think the proper solution involves putting a
reference count on the mid object. This patch makes sure that the
demultiplexer thread holds a reference until it has finished
processing the transaction.

Cc: stable@vger.kernel.org
Signed-off-by: Lars Persson <larper@axis.com>
Acked-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05 13:48:24 -05:00
Linus Torvalds
d02d21ea00 autofs: rename 'autofs' module back to 'autofs4'
It turns out that systemd has a bug: it wants to load the autofs module
early because of some initialization ordering with udev, and it doesn't
do that correctly.  Everywhere else it does the proper "look up module
name" that does the proper alias resolution, but in that early code, it
just uses a hardcoded "autofs4" for the module name.

The result of that is that as of commit a2225d931f ("autofs: remove
left-over autofs4 stubs"), you get

    systemd[1]: Failed to insert module 'autofs4': No such file or directory

in the system logs, and a lack of module loading.  All this despite the
fact that we had very clearly marked 'autofs4' as an alias for this
module.

What's so ridiculous about this is that literally everything else does
the module alias handling correctly, including really old versions of
systemd (that just used 'modprobe' to do this), and even all the other
systemd module loading code.

Only that special systemd early module load code is broken, hardcoding
the module names for not just 'autofs4', but also "ipv6", "unix",
"ip_tables" and "virtio_rng".  Very annoying.

Instead of creating an _additional_ separate compatibility 'autofs4'
module, just rely on the fact that everybody else gets this right, and
just call the module 'autofs4' for compatibility reasons, with 'autofs'
as the alias name.

That will allow the systemd people to fix their bugs, adding the proper
alias handling, and maybe even fix the name of the module to be just
"autofs" (so that they can _test_ the alias handling).  And eventually,
we can revert this silly compatibility hack.

See also

    https://github.com/systemd/systemd/issues/9501
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902946

for the systemd bug reports upstream and in the Debian bug tracker
respectively.

Fixes: a2225d931f ("autofs: remove left-over autofs4 stubs")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: Michael Biebl <biebl@debian.org>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-05 11:35:04 -07:00
Janosch Frank
1e2c043628 userfaultfd: hugetlbfs: fix userfaultfd_huge_must_wait() pte access
Use huge_ptep_get() to translate huge ptes to normal ptes so we can
check them with the huge_pte_* functions.  Otherwise some architectures
will check the wrong values and will not wait for userspace to bring in
the memory.

Link: http://lkml.kernel.org/r/20180626132421.78084-1-frankja@linux.ibm.com
Fixes: 369cd2121b ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-03 17:32:18 -07:00
Andreas Gruenbacher
806a1477b1 iomap: add inline data support to iomap_readpage_actor
Just copy the inline data into the page using the existing helper.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-03 09:07:47 -07:00
Andreas Gruenbacher
ec181f6782 iomap: support direct I/O to inline data
Add support for reading from and writing to inline data to iomap_dio_rw.
This saves filesystems from having to implement fallback code for this
case.

The inline data is actually cached in the inode, so the I/O is only
direct in the sense that it doesn't go through the page cache.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-03 09:07:47 -07:00
Christoph Hellwig
09230435df iomap: refactor iomap_dio_actor
Split the function up into two helpers for the bio based I/O and hole
case, and a small helper to call the two.  This separates the code a
little better in preparation for supporting I/O to inline data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-03 09:07:46 -07:00
Jon Derrick
a17712c8e4 ext4: check superblock mapped prior to committing
This patch attempts to close a hole leading to a BUG seen with hot
removals during writes [1].

A block device (NVME namespace in this test case) is formatted to EXT4
without partitions. It's mounted and write I/O is run to a file, then
the device is hot removed from the slot. The superblock attempts to be
written to the drive which is no longer present.

The typical chain of events leading to the BUG:
ext4_commit_super()
  __sync_dirty_buffer()
    submit_bh()
      submit_bh_wbc()
        BUG_ON(!buffer_mapped(bh));

This fix checks for the superblock's buffer head being mapped prior to
syncing.

[1] https://www.spinics.net/lists/linux-ext4/msg56527.html

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-07-02 18:45:18 -04:00
Linus Torvalds
d3bc0e67f8 for-4.18-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAls4zz0ACgkQxWXV+ddt
 WDs0ZhAAplEAcN1BP986BS7GpjrG20vQtP9AnHlnSEJJJmsnykpspBylOcLRkKjF
 LKBfBPCKqIo7kn5ebKT1Kk7zJPkOOEfmxGW7hffVN/oa/oMtmgJbbHDUgl2TDgdu
 rky1O+Bj+S37s5rhiXAJ4oU9ekdpWIlN30GczfynjiPqGigKh/cINsEEhQIIAiJG
 PRDQfSIJeh67x1AP0KE8sJAYSsaeFxT+kHrT/NPs1NFDSzrQSa/QWPFVjGVVuI/Y
 w84Mo0EqdRV7tap7D3QyWyYea6zdP00PG8TyLl0Kba+LckFbzpNN5hP3SUxleBzL
 0ZBJi7/tOqnrMV3YaGm40dLfgD4B+CFt8zDyg2JvWUxxEzfQfYif7KIT2IV8fSqS
 QrVw2NrzQC7EZ4Zu98wCN7dyyOE8yhqbq805YdG3Nj+zT6DqRu01TBo4Yr/Ek8ux
 +ITAtQVbaOZmTIt/qh/Oxc5jRsurAno1FP3XRH+1hfSlS7xc3LfI1CUbX3jAKzXN
 edxdM4/h+d4nekvROnKBH4EheS6+ZVfgzYlYUW9c2rjcJ1RHhDElbh14+IoM6LKJ
 nJ+Cp+744F6W5jaG3oWElJrdhlY31mWUjiZaj2CHl16EcH3MToytrxKMX+OWo95W
 gChnKicrtpO6+9nbED3Tdhp7SkbDysun6jvEpSgdlm8+2H5Kwrw=
 =KWL7
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We have a few regression fixes for qgroup rescan status tracking and
  the vm_fault_t conversion that mixed up the error values"

* tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix mount failure when qgroup rescan is in progress
  Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
  btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
2018-07-01 12:38:16 -07:00
Linus Torvalds
4a770e638f Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Followup to procfs-seq_file series this window"

This fixes a memory leak by making sure that proc seq files release any
private data on close.  The 'proc_seq_open' has to be properly paired
with 'proc_seq_release' that releases the extra private data.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  proc: add proc_seq_release
2018-07-01 12:32:19 -07:00
Linus Torvalds
7886953859 A trivial dentry leak fix from Zheng.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAls2PxETHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi51OB/4yN29wPAiCFDeN87gTnkO2flNbATzH
 PCBMydy4nUSfUmnfaMZyMLo0xM6EXIxLte9kGt0VEQwJrw7cwK+IhkhhOQtbH2ER
 /oxul0nAfgEOp0kFzHY7iQzd1qppaxAYtfXLy/BdOsTbntHcHPvp619+JKiuJoXM
 XA2ABV/bdw21FMPq+kP4uyzQsPw/yTXAw47JlVe1PqsCE17Eyj3X8GaMXeTZo52z
 bbOzFJBxlOOHwJ2w6TM4uK6Pr/BgIF4QHDkcNfNzajtdzW9y9wAmVb2GrmkLnwMw
 rou6MgmbB76nUL82SWMMqSeqV1hHGW4iPqu8kpWekIbGOSZJ8a7FIbqo
 =qSQ6
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A trivial dentry leak fix from Zheng"

* tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-client:
  ceph: fix dentry leak in splice_dentry()
2018-06-29 12:19:47 -07:00
Linus Torvalds
a11e1d432b Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL
The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 10:40:47 -07:00
Filipe Manana
e4e7ede739 Btrfs: fix mount failure when qgroup rescan is in progress
If a power failure happens while the qgroup rescan kthread is running,
the next mount operation will always fail. This is because of a recent
regression that makes qgroup_rescan_init() incorrectly return -EINVAL
when we are mounting the filesystem (through btrfs_read_qgroup_config()).
This causes the -EINVAL error to be returned regardless of any qgroup
flags being set instead of returning the error only when neither of
the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON
are set.

A test case for fstests follows up soon.

Fixes: 9593bf4967 ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-28 11:30:57 +02:00
Chris Mason
717beb96d9 Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
The vm_fault_t conversion commit introduced a ret2 variable for tracking
the integer return values from internal btrfs functions.  It was
sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
and had been removed from the radix.  Something like this:

    ret2 = btrfs_delalloc_reserve_space() // returns zero on success

    lock_page(page)
    if (page->mapping != inode->i_mapping)
	goto out_unlock;

...

out_unlock:
    if (!ret2) {
	    ...
	    return VM_FAULT_LOCKED;
    }

This ends up triggering this WARNING in btrfs_destroy_inode()
    WARN_ON(BTRFS_I(inode)->block_rsv.size);

xfstests generic/095 was able to reliably reproduce the errors.

Since out_unlock: is only used for errors, this fix moves it below the
if (!ret2) check we use to return VM_FAULT_LOCKED for success.

Fixes: a528a24150 (btrfs: change return type of btrfs_page_mkwrite to vm_fault_t)
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-28 11:30:50 +02:00
Qu Wenruo
6f7de19ed3 btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
Commit ff3d27a048 ("btrfs: qgroup: Finish rescan when hit the last leaf
of extent tree") added a new exit for rescan finish.

However after finishing quota rescan, we set
fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the
original exit path.
While we missed that assignment of (u64)-1 in the new exit path.

The end result is, the quota status item doesn't have the same value.
(-1 vs the last bytenr + 1)
Although it doesn't affect quota accounting, it's still better to keep
the original behavior.

Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Fixes: ff3d27a048 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-28 11:30:48 +02:00
Chunyu Hu
877f919e19 proc: add proc_seq_release
kmemleak reported some memory leak on reading proc files. After adding
some debug lines, find that proc_seq_fops is using seq_release as
release handler, which won't handle the free of 'private' field of
seq_file, while in fact the open handler proc_seq_open could create
the private data with __seq_open_private when state_size is greater
than zero. So after reading files created with proc_create_seq_private,
such as /proc/timer_list and /proc/vmallocinfo, the private mem of a
seq_file is not freed. Fix it by adding the paired proc_seq_release
as the default release handler of proc_seq_ops instead of seq_release.

Fixes: 44414d82cf ("proc: introduce proc_create_seq_private")
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-27 20:44:38 -04:00
Linus Torvalds
f57494321c Changes since last update:
- More metadata validation strengthening to prevent crashes.
 - Fix extent offset overflow problem when insert_range on a 512b block fs
 - Fix some off-by-one errors in the realtime fsmap code
 - Fix some math errors in the default resblks calculation when free space
   is low
 - Fix a problem where stale page contents are exposed via mmap read
   after a zero_range at eof
 - Fix accounting problems with per-ag reservations causing statfs
   reports to vary incorrectly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsv6mQACgkQ+H93GTRK
 tOv3ZA//W9QVf6aGv3fefg7ZkleRpKPhpvrdgy+LnnhGdkToKGsOIJGavFjBYfBh
 cFSOXrKamNiSZiJzzQ8NiFaZ8wrD2NAhOybatp7jAGKGb+a3VTtoz7v+4VwJ5Psw
 gPdZ+zADqsbbUg4Xx08TDqdQDHlIXz6kTvrM+MGBq5TynICAIUgS2o9fJT1HKf12
 xfbC2dh2cK/7TnN0MNCLzekrPoiLfuEPsMkrnWv/PBre0AhfcCJLbNhQi5+mCWcs
 e71vttXAc3kkSly6G4D9SSb+5Mn1Q1SkBsORn82fJdI59eHjFn1ujLvkfPM1w1Iu
 oDjJw8PCyrG7NFZiTh46MuJhvH/Tc6ZZvop/YJyJJApHOYYx6K64enpJrWcg+CKp
 Rcl5gQocvlIfMyzBvyozpHctlIuOjkQsSmyEXXx8PAsr7zU+FehXS5ENzCOtdg5U
 LivcsXmh0bvSU1PdzA3oiANcIPAfVX6+RoM4rQpDjT+IS7Ud9F2hELD0Yi/pmk4l
 oC91+e4oPJ4HsoCDmxePLl29Gqm4n1lY9i3l1PsZe8yWeYxs991R/hJuxKvRQLul
 sxkgKk5nvdG2PjSmAKx9XYRrHEf/3uK+TEtrQtrm06i1QEI881qqSEMHpaoRj0qo
 awESWimVodllaag+ugFLi+u+UAh8LA0jTAmijBqczLtspU1Cg0c=
 =Aw39
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.18-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Here are some patches for 4.18 to fix regressions, accounting
  problems, overflow problems, and to strengthen metadata validation to
  prevent corruption.

  This series has been run through a full xfstests run over the weekend
  and through a quick xfstests run against this morning's master, with
  no major failures reported.

  Changes since last update:

   - more metadata validation strengthening to prevent crashes.

   - fix extent offset overflow problem when insert_range on a 512b
     block fs

   - fix some off-by-one errors in the realtime fsmap code

   - fix some math errors in the default resblks calculation when free
     space is low

   - fix a problem where stale page contents are exposed via mmap read
     after a zero_range at eof

   - fix accounting problems with per-ag reservations causing statfs
     reports to vary incorrectly"

* tag 'xfs-4.18-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation
  xfs: ensure post-EOF zeroing happens after zeroing part of a file
  xfs: fix off-by-one error in xfs_rtalloc_query_range
  xfs: fix uninitialized field in rtbitmap fsmap backend
  xfs: recheck reflink state after grabbing ILOCK_SHARED for a write
  xfs: don't allow insert-range to shift extents past the maximum offset
  xfs: don't trip over negative free space in xfs_reserve_blocks
  xfs: allow empty transactions while frozen
  xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
  xfs: More robust inode extent count validation
  xfs: simplify xfs_bmap_punch_delalloc_range
2018-06-27 12:21:06 -07:00
Yan, Zheng
8b8f53af1e ceph: fix dentry leak in splice_dentry()
In any case, d_splice_alias() does not drop reference of original
dentry.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-26 18:42:44 +02:00
Linus Torvalds
84bfed40fc for-4.18-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlsvtAIACgkQxWXV+ddt
 WDvPJQ//eEr+ACNxG5n42e63TaNpLOeoGAsUChwekP6DAIc2n/r78SHX+T/woDqd
 7/dc2eqlYF5Fmn6MPQ1ufL2xlfw0t3OnemK8T5+4sxZDdzeAH9V+kHaqoaLUPExL
 4r6lK5Ywwa2cWghC7WvQg3+bWLX18OEExG63SlEhLo3YJM5uUqVhGVPi6ARrbxNM
 GJvXcQsxjXLqukm4gYvHC6Zra9Awv65uiAU+VCm2y96j1fEJ0yjK/pC1RtoFGCqS
 48Jjuzfq/V3nxy0Wjr3DvpQVEQcKyGha6i/eazZISdRhGSjrYdvIwpUn7gZeoo2Q
 hT8VVergLbVYgIeaOwgwubQzNaG2C/ZTsEjPQrNrA7a/AGsh5C/ommYE9MSyS6L0
 PG0NLUNDXFmEj8WdI97So+1Sm2OCb04DPbPhHbbkhw5L/MPE1TaLN5aUWguj8laB
 NnyPRdVP9jCJAI0OhJY7nPDmsPKe2jogVVsRheTcob+V5G+vIgzDXlGfsW/88Seg
 dHubMaC0nz6u8Cj4dIviitiLXuustyz0JkVdljTLawWWJ/Hs7NlsSf3Q3nj2Kvia
 e8QMID0vLphQyL1hqC0n7M0g2ohq9NUGT1nhLTPSpFl0l8bIA9PQehOx9Q5vx8yp
 tJF+d0qiNfgvadA+KhyvQ8puQb2+zZQ8+Pqfjwd4ySD7SI5dcA4=
 =fCdm
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Two regression fixes and an incorrect error value propagation fix from
  'rename exchange'"

* tag 'for-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix return value on rename exchange failure
  btrfs: fix invalid-free in btrfs_extent_same
  Btrfs: fix physical offset reported by fiemap for inline extents
2018-06-26 08:41:54 -07:00
Darrick J. Wong
d8cb5e4237 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation
In __xfs_ag_resv_init we incorrectly calculate the amount by which to
decrease fdblocks when reserving blocks for the rmapbt.  Because rmapbt
allocations do not decrease fdblocks, we must decrease fdblocks by the
entire size of the requested reservation in order to achieve our goal of
always having enough free blocks to satisfy an rmapbt expansion.

This is in contrast to the refcountbt/finobt, which /do/ subtract from
fdblocks whenever they allocate a block.  For this allocation type we
preserve the existing behavior where we decrease fdblocks only by the
requested reservation minus the size of the existing tree.

This fixes the problem where the available block counts reported by
statfs change across a remount if there had been an rmapbt size change
since mount time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-06-24 12:00:12 -07:00
Darrick J. Wong
e53c4b5983 xfs: ensure post-EOF zeroing happens after zeroing part of a file
If a user asks us to zero_range part of a file, the end of the range is
EOF, and not aligned to a page boundary, invoke writeback of the EOF
page to ensure that the post-EOF part of the page is zeroed.  This
ensures that we don't expose stale memory contents via mmap, if in a
clumsy manner.

Found by running generic/127 when it runs zero_range and mapread at EOF
one after the other.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
a3a374bf18 xfs: fix off-by-one error in xfs_rtalloc_query_range
In commit 8ad560d256 ("xfs: strengthen rtalloc query range checks")
we strengthened the input parameter checks in the rtbitmap range query
function, but introduced an off-by-one error in the process.  The call
to xfs_rtfind_forw deals with the high key being rextents, but we clamp
the high key to rextents - 1.  This causes the returned results to stop
one block short of the end of the rtdev, which is incorrect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
232d0a24b0 xfs: fix uninitialized field in rtbitmap fsmap backend
Initialize the extent count field of the high key so that when we use
the high key to synthesize an 'unknown owner' record (i.e. used space
record) at the end of the queried range we have a field with which to
compute rm_blockcount.  This is not strictly necessary because the
synthesizer never uses the rm_blockcount field, but we can shut up the
static code analysis anyway.

Coverity-id: 1437358
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
5bd88d1539 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write
The reflink iflag could have changed since the earlier unlocked check,
so if we got ILOCK_SHARED for a write and but we're now a reflink inode
we have to switch to ILOCK_EXCL and relock.

This helps us avoid blowing lock assertions in things like generic/166:

XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383
WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:assfail+0x25/0x30 [xfs]
Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447
RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000
R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9
FS:  00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0
Call Trace:
 xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs]
 xfs_file_iomap_begin+0x6d2/0xeb0 [xfs]
 ? iomap_to_fiemap+0x80/0x80
 iomap_apply+0x5e/0x130
 iomap_dio_rw+0x2e0/0x400
 ? iomap_to_fiemap+0x80/0x80
 ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
 xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
 xfs_file_write_iter+0x7b/0xb0 [xfs]
 __vfs_write+0x16f/0x1f0
 vfs_write+0xc8/0x1c0
 ksys_pwrite64+0x74/0x90
 do_syscall_64+0x56/0x180
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
f62cb48e43 xfs: don't allow insert-range to shift extents past the maximum offset
Zorro Lang reports that generic/485 blows an assert on a filesystem with
512 byte blocks.  The test tries to fallocate a post-eof extent at the
maximum file size and calls insert range to shift the extents right by
two blocks.  On a 512b block filesystem this causes startoff to overflow
the 54-bit startoff field, leading to the assert.

Therefore, always check the rightmost extent to see if it would overflow
prior to invoking the insert range machinery.

Reported-by: zlang@redhat.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
aafe12cee0 xfs: don't trip over negative free space in xfs_reserve_blocks
If we somehow end up with a filesystem that has fewer free blocks than
the blocks set aside to avoid ENOSPC deadlocks, it's possible that the
free space calculation in xfs_reserve_blocks will spit out a negative
number (because percpu_counter_sum returns s64).  We fail to notice
this negative number and set fdblks_delta to it.  Now we increment
fdblocks(!) and the unsigned type of m_resblks means that we end up
setting a ridiculously huge m_resblks reservation.

Avoid this comedy of errors by detecting the negative free space and
returning -ENOSPC.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Darrick J. Wong
10ee25268e xfs: allow empty transactions while frozen
In commit e89c041338 ("xfs: implement the GETFSMAP ioctl") we
created the ability to obtain empty transactions.  These transactions
have no log or block reservations and therefore can't modify anything.
Since they're also NO_WRITECOUNT they can run while the fs is frozen,
so we don't need to WARN_ON about that usage.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:35 -07:00
Filipe Manana
c5b4a50b74 Btrfs: fix return value on rename exchange failure
If we failed during a rename exchange operation after starting/joining a
transaction, we would end up replacing the return value, stored in the
local 'ret' variable, with the return value from btrfs_end_transaction().
So this could end up returning 0 (success) to user space despite the
operation having failed and aborted the transaction, because if there are
multiple tasks having a reference on the transaction at the time
btrfs_end_transaction() is called by the rename exchange, that function
returns 0 (otherwise it returns -EIO and not the original error value).
So fix this by not overwriting the return value on error after getting
a transaction handle.

Fixes: cdd1fedf82 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-22 12:59:08 +02:00
Linus Torvalds
894b8c000a \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlssqQcACgkQnJ2qBz9k
 QNl2XwgAhtb2LvBhWKh3OkBVczICgc9eQho03KiHJkEcR1OCkgjOJhVBDrQmJbBa
 tAuMmUcnTpLJG8CI5TyR0G4hNbttE2LuTu+6RV67hXOjhiYAQ5P9wYLyUqfZf/01
 myrPEewr9qqx3h2htqufiLQIyO1M4FeM37VqdH7vZhQOb+B+FUw7JB9a/HbCpNh/
 7NUDf2GbLtLjK+2Xh0ttXvfWjgbLjC4wMmaPaa5+Nabn+URtvX9aHgQ/dYrOjQ16
 7oD5K5x3k3sOT6Ix7xRGLHgE6Xl6MlTtxpyt5ldb96RwWO/GAuMhSCBd14nS2hmx
 fEx5N2vbs3v8Ux+ZdMauzAKZI6Fz2g==
 =oFo+
 -----END PGP SIGNATURE-----

Merge tag 'for_v4.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf, quota, ext2 fixes from Jan Kara:
 "UDF:
   - fix an oops due to corrupted disk image
   - two small cleanups

  quota:
   - a fixfor lru handling
   - cleanup

  ext2:
   - a warning about a deprecated mount option"

* tag 'for_v4.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Drop unused arguments of udf_delete_aext()
  udf: Provide function for calculating dir entry length
  udf: Detect incorrect directory size
  ext2: add warning when specifying nocheck option
  quota: Cleanup list iteration in dqcache_shrink_scan()
  quota: reclaim least recently used dquots
2018-06-22 18:04:56 +09:00
Dave Chinner
e53946dbd3 xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
When a corrupt inode is detected during xfs_iflush_cluster, we can
get a shutdown ASSERT failure like this:

XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
XFS (pmem1): Unmount and run xfs_repair
XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c.  Return address = ffffffff814f4116
XFS (pmem1): Corruption of in-memory data detected.  Shutting down filesystem
XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8a88
XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8ef9
XFS (pmem1): Please umount the filesystem and rectify the problem(s)
XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
.....
Call Trace:
 xfs_iflush_abort+0x10a/0x110
 xfs_iflush+0xf3/0x390
 xfs_inode_item_push+0x126/0x1e0
 xfsaild+0x2c5/0x890
 kthread+0x11c/0x140
 ret_from_fork+0x24/0x30

Essentially, xfs_iflush_abort() has been called twice on the
original inode that that was flushed. This happens because the
inode has been flushed to teh buffer successfully via
xfs_iflush_int(), and so when another inode is detected as corrupt
in xfs_iflush_cluster, the buffer is marked stale and EIO, and
iodone callbacks are run on it.

Running the iodone callbacks walks across the original inode and
calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
to xfs_iflush(), it runs the error path for that function, and that
calls xfs_iflush_abort() on the inode a second time, leading to the
above assert failure as the inode is not flush locked anymore.

This bug has been there a long time.

The simple fix would be to just avoid calling xfs_iflush_abort() in
xfs_iflush() if we've got a failure from xfs_iflush_cluster().
However, xfs_iflush_cluster() has magic delwri buffer handling that
means it may or may not have run IO completion on the buffer, and
hence sometimes we have to call xfs_iflush_abort() from
xfs_iflush(), and sometimes we shouldn't.

After reading through all the error paths and the delwri buffer
code, it's clear that the error handling in xfs_iflush_cluster() is
unnecessary. If the buffer is delwri, it leaves it on the delwri
list so that when the delwri list is submitted it sees a shutdown
fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
when it's not a delwri buffer. Further, marking a buffer stale
clears the _XBF_DELWRI_Q flag on the buffer, which means when
submission of the buffer occurs, it just skips over it and releases
it.

IOWs, the error handling in xfs_iflush_cluster doesn't need to care
if the buffer is already on a the delwri queue or not - it just
needs to mark the buffer stale, EIO and run completions. That means
we can just use the easy fix for xfs_iflush() to avoid the double
abort.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:31:38 -07:00
Dave Chinner
23fcb3340d xfs: More robust inode extent count validation
When the inode is in extent format, it can't have more extents that
fit in the inode fork. We don't currenty check this, and so this
corruption goes unnoticed by the inode verifiers. This can lead to
crashes operating on invalid in-memory structures.

Attempts to access such a inode will now error out in the verifier
rather than allowing modification operations to proceed.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix a typedef, add some braces and breaks to shut up compiler warnings]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:25:57 -07:00
Christoph Hellwig
e2ac836307 xfs: simplify xfs_bmap_punch_delalloc_range
Instead of using xfs_bmapi_read to find delalloc extents and then punch
them out using xfs_bunmapi, opencode the loop to iterate over the extents
and call xfs_bmap_del_extent_delay directly.  This both simplifies the
code and reduces the number of extent tree lookups required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:24:38 -07:00
Linus Torvalds
27db64f65f NFS client bugfixes for Linux 4.18
Hightlights include:
 
 Bugfixes:
 - Fix an rcu deadlock in nfs_delegation_find_inode()
 - Fix NFSv4 deadlocks due to not freeing the session slot in layoutget
 - Don't send layoutreturn if the layout is already invalid
 - Prevent duplicate XID allocation
 - flexfiles: Don't tie up all the rpciod threads in resends
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbK9avAAoJEA4mA3inWBJclq0P/1VCigyDlsbtdby3z2leV84k
 l0asrGjOQndljJ7I21awAgEo8KvXOd66cMv6YT3+UqEW18aNblH4/ngjyId6hPVb
 RZDX7tsG16ZHEqfe9f9irNZo90mdvuSC4ChJ/CbbPesaK9pblE1d76b/qUVr4FUX
 Gj7JPAC5ckoiZXPFfRWfc+o7JnGvs5wkEuDTy+ig6v7BRdL64hdPG3veRNpmLIAZ
 uS/NCyRpO+nFN/ukmvuoI2ZQ3qfHubHBD+rHxr1UKT/ad7dywLmL2UBaYQ0Tl3bq
 /iSQHutgJYj/80VaRTqdlLt/m4ebUZg+9BEZgM5MvqBWkXcpXND51zxExVJN4cGW
 BOytqjLz0gP1OGb8w+Oow58K8l4XyEgHe2CtZ6Yz8Vwof7nchkpv7RSX50hJFIcA
 YlikeDyDzfOmTT6ove5kF31WQSa3Bk6OMEei0of6hWU3UVHyEdr9az73pm/CLSHE
 /R7w0osU3B9tmQD4btQeJ2DxP+syQwhelOYodyVTwOlkmmGg7DSV7fehnGyH8t8f
 I4Yp8f0raiYGbwonYVE2+zDO140VRETEfTE4XQZnn41fZUfB74oIqk77JtgvGMk2
 /+XFNCYBGadHdSBdxyJmhSjhoAWrhgChEIz1G12SiHrNvqIRY/uHhdCX1Ut5vlPf
 5aqyn/yXm6rUH7aNh/Gd
 =tz0M
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Hightlights include:

   - fix an rcu deadlock in nfs_delegation_find_inode()

   - fix NFSv4 deadlocks due to not freeing the session slot in
     layoutget

   - don't send layoutreturn if the layout is already invalid

   - prevent duplicate XID allocation

   - flexfiles: Don't tie up all the rpciod threads in resends"

* tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS/flexfiles: Process writeback resends from nfsiod context as well
  pNFS/flexfiles: Don't tie up all the rpciod threads in resends
  sunrpc: Prevent duplicate XID allocation
  pNFS: Don't send layoutreturn if the layout is already invalid
  pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception
  NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
2018-06-22 06:21:34 +09:00
Lu Fengqi
22883ddc66 btrfs: fix invalid-free in btrfs_extent_same
If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
		   (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
is hit, we will go to free the uninitialized cmp.src_pages and
cmp.dst_pages.

Fixes: 67b07bd4be ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-21 19:21:13 +02:00
Filipe Manana
f098631848 Btrfs: fix physical offset reported by fiemap for inline extents
Commit 9d311e11fc ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents (and other extents
with a special block_start value). This is because it always sets the
variable used to report the physical offset ("disko") as em->block_start
plus some offset, and em->block_start has the value 18446744073709551614
((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, for
a file with an inline extent we have the following items in the subvolume
tree:

    item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
           generation 25 transid 38 size 1525 nbytes 1525
           block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
           sequence 0 flags 0x2(none)
           atime 1529342058.461891730 (2018-06-18 18:14:18)
           ctime 1529342058.461891730 (2018-06-18 18:14:18)
           mtime 1529342058.461891730 (2018-06-18 18:14:18)
           otime 1529342055.869892885 (2018-06-18 18:14:15)
    item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
           index 25 namelen 3 name: fc7
    item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
           generation 38 type 0 (inline)
           inline extent data size 1525 ram_bytes 1525 compression 0 (none)

Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..    4095: 18446744073709551614..      4093:   4096:             last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
    --- tests/btrfs/004.out	2016-08-23 10:17:35.027012095 +0100
    +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad	2018-06-18 18:15:02.385872155 +0100
    @@ -1,3 +1,10 @@
     QA output created by 004
     *** test backref walking
    -*** done
    +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression expected
    +ERROR: 7.55578637259143e+22 is not a valid numeric value.
    +unexpected output from
    +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -s 65536 -P 7.55578637259143e+22 /home/fdmanana/btrfs-tests/scratch_1
    ...
    (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an extent with a special block_start value.

Fixes: 9d311e11fc ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-21 19:21:13 +02:00
Christoph Hellwig
c03cea4214 iomap: add initial support for writes without buffer heads
For now just limited to blocksize == PAGE_SIZE, where we can simply read
in the full page in write begin, and just set the whole page dirty after
copying data into it.  This code is enabled by default and XFS will now
be feed pages without buffer heads in ->writepage and ->writepages.

If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old
path will still be used, this both helps the transition in XFS and
prepares for the gfs2 migration to the iomap infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-20 09:32:41 -07:00
Jan Kara
6c1e4d06a3 udf: Drop unused arguments of udf_delete_aext()
udf_delete_aext() uses its last two arguments only as local variables.
Drop them.

Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:05:49 +02:00
Jan Kara
f2e8334711 udf: Provide function for calculating dir entry length
Provide function for calculating directory entry length and use to
reduce code duplication.

Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:05:49 +02:00
Jan Kara
fa65653e57 udf: Detect incorrect directory size
Detect when a directory entry is (possibly partially) beyond directory
size and return EIO in that case since it means the filesystem is
corrupted. Otherwise directory operations can further corrupt the
directory and possibly also oops the kernel.

CC: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
CC: stable@vger.kernel.org
Reported-and-tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:05:31 +02:00
Chengguang Xu
27e6ed54a3 ext2: add warning when specifying nocheck option
The option nocheck(nocheck/check=none) is useless but considering
backwards compatibility it's better to print warning for a while
before completely remove from the code.

This patch add proper warning message for option 'nocheck' and
remove unnecessary comment/function declaration which is used for
removed option 'check'.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:04:26 +02:00
Jan Kara
1822193b5d quota: Cleanup list iteration in dqcache_shrink_scan()
Use list_first_entry() and list_empty() instead of opencoded variants.

Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:04:26 +02:00
Greg Thelen
9560ba306d quota: reclaim least recently used dquots
The dquots in the free_dquots list are not reclaimed in LRU way.
put_dquot_last() puts entries to the tail and dqcache_shrink_scan()
frees from the tail. Free unreferenced dquots in LRU order because it
seems more reasonable than freeing most recently used.

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:04:26 +02:00
Linus Torvalds
f5b65348fd proc: fix missing final NUL in get_mm_cmdline() rewrite
The rewrite of the cmdline fetching missed the fact that we used to also
return the final terminating NUL character of the last argument.  I
hadn't noticed, and none of the tools I tested cared, but something
obviously must care, because Michal Kubecek noticed the change in
behavior.

Tweak the "find the end" logic to actually include the NUL character,
and once past the eend of argv, always start the strnlen() at the
expected (original) argument end.

This whole "allow people to rewrite their arguments in place" is a nasty
hack and requires that odd slop handling at the end of the argv array,
but it's our traditional model, so we continue to support it.

Repored-and-bisected-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-and-tested-by: Michal Kubecek <mkubecek@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-20 15:38:28 +09:00
Christoph Hellwig
72b4daa241 iomap: add an iomap-based readpage and readpages implementation
Simply use iomap_apply to iterate over the file and a submit a bio for
each non-uptodate but mapped region and zero everything else.  Note that
as-is this can not be used for file systems with a blocksize smaller than
the page size, but that support will be added later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:57 -07:00
Christoph Hellwig
63899c6f88 iomap: add a page_done callback
This will be used by gfs2 to attach data to transactions for the journaled
data mode.  But the concept is generic enough that we might be able to
use it for other purposes like encryption/integrity post-processing in the
future.

Based on a patch from Andreas Gruenbacher.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:56 -07:00
Andreas Gruenbacher
19e0c58f65 iomap: generic inline data handling
Add generic inline data handling by adding a pointer to the inline data
region to struct iomap.  When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.

This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: small cleanups to better fit in with other iomap work]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:56 -07:00
Andreas Gruenbacher
ebf00be37d iomap: complete partial direct I/O writes synchronously
According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write.  In that
case, wait for the short write to complete to allow gfs2 to recover.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:55 -07:00
Andreas Gruenbacher
3d7b6b21f6 iomap: mark newly allocated buffer heads as new
In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set.  This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:55 -07:00
Christoph Hellwig
a6d639da63 fs: factor out a __generic_write_end helper
Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-19 15:10:55 -07:00
Trond Myklebust
7b0df92ac1 pNFS/flexfiles: Process writeback resends from nfsiod context as well
Although the writeback resends are more robust than the reads, since they
are not immediately rescheduled by the same thread, we are better off
processing them in the same place as the reads.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-19 09:25:27 -04:00
Trond Myklebust
42f86b44a4 pNFS/flexfiles: Don't tie up all the rpciod threads in resends
We do not want to have rpciod threads perform recursive calls into the
RPC layer since that can deadlock. In particular, having to wait for
a layoutget can be nasty... We want rather to defer scheduling those
retries until we're in the rpc_release() callback, since that is
called from the nfsiod workqueue.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-19 09:25:27 -04:00
Trond Myklebust
c8bf707353 pNFS: Don't send layoutreturn if the layout is already invalid
If the layout was invalidated due to a reboot, then don't try to send
a layoutreturn for it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-19 08:52:27 -04:00
Trond Myklebust
2dbf8dffbf pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception
Right now, we can call nfs_commit_inode() while holding the session slot,
which could lead to NFSv4 deadlocks. Ensure we only keep the slot if
the server returned a layout that we have to process.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-19 08:52:27 -04:00
Linus Torvalds
ba4dbdedd3 This fixes a too-small allocation in the xattr code.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAlsnrccACgkQNqiEXrVA
 jGQ8rBAAjpdJS9Rzi92wn+M204/E578q/vvech5Xok4+sW2JNNSMvlOsF6siNsHd
 gia8Z0AZcoQdN5wuzjqIoY2fFBFZanZYQbmObxJ69YE0b8CBY4CQlgIB1P/WLW/J
 dpqBvlyXr/clukCehM2AoWTGbSN+rnqGrZ0ro9cYBN7MmL0fmGb3+3riGJAipS9G
 qyuhafQnenmMQUUruT+8kuWy72gxK9/rvHPlLO9O/2AZ3q8HMByTE+hKhGFazBvv
 Wlp2NomQisR4PenH4u1lXpb5QjCK1WdvUJ0bh1q8f7qyYB0x5EgS/aZr+T0FX6fS
 1Aio7T0SzBzV5tKtX1LHwyz936aIR6Mtgt7IGiQ453Va7JyFQCVBIkK4lojBZF2/
 pvlGi5/dK/9yJ3+uls88dnx/++wZzwdhB80fRRniOmoygCnAlgVurbyzYByKFS+C
 iA7M0Bha5nHOmAKb9vg4kfH+ini5b+45jXjgR6syCoa0VPXIlYXNkbSjJSS+S/gM
 4VNWkvkg0TgBRkGpigpCaIn7otMNaJ5xILSoKwup+Ocg9ITvF2cWn8zdlVrfz28h
 MEF1LmFAs2e1PHjm69KW326M3Bs8Wkrd8lGOc7vR0Dst7VlxwrddQWMJrx2j7Yau
 +r9E3kRM4PvGMsAX4DF1mcnEIGsPkrN6q9XRL9ufNT92d9ww6Vg=
 =fEwn
 -----END PGP SIGNATURE-----

Merge tag 'jfs-4.18' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from Dave Kleikamp:
 "This fixes a too-small allocation in the xattr code"

* tag 'jfs-4.18' of git://github.com/kleikamp/linux-shaggy:
  jfs: Fix inconsistency between memory allocation and ea_buf->max_size
2018-06-19 07:47:32 +09:00
Linus Torvalds
9ffc59d572 Misc. SMB3 fixes, including particularly important ones for signing, some minor documentation and debug improvements and another posix smb3.11 fix
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlsmhmUACgkQiiy9cAdy
 T1FgTQwAqwTrVg6cf8EDqvWqk4MJUmGt8dvG0f8ikwa/oou8FspdQNSLnRpM5id5
 xH/VbAULkCzNYqC8KO5d3SRAB/ZidJOBqzNPvpOdNiSvu+VfrC2kA9+NePXs41lf
 Ib+shOwGH2q2HKf8seLA2ivFZDBZLuAsKkYxQN4w4PPKEC8k3WZRYqEFw4OL1FDB
 v02+X3H5QDOFpVEIA69meWx8ezLnLtVI5PHWCj58/wbXOZpU3XO6klxl2XpccGQa
 MPBxOt3Ln1HhzEMPjcaB0TXhig8V0pOAhI6vsCasT6yN8orev5c1z5U2tz32Rgq0
 U6qrTG1mUSD1Jl45kl0CDj2clzkA60XbG2nFkczQ0twoDvwK41xbn4HL+DcrzDWY
 FzRbMZ2wNb7UPUTUxBFh4DrBPdH97deyUMIn1wEYwin2MNveIE5qtS8nxSgJc8zG
 3Tzed1SWGV/YEB794vMlFIRb2DZXWnzvjziKQy8aVHgmCcgWusl+75yKfvfJr8xx
 +D5LdNmu
 =KTNX
 -----END PGP SIGNATURE-----

Merge tag '4.18-rc1-more-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Misc SMB3 fixes, including particularly important ones for signing,
  some minor documentation and debug improvements and another posix
  smb3.11 fix"

* tag '4.18-rc1-more-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix invalid check in __cifs_calc_signature()
  cifs: Use correct packet length in SMB2_TRANSFORM header
  smb3: fix corrupt path in subdirs on smb311 with posix
  smb3: do not display empty interface list
  smb3: Fix mode on mkdir on smb311 mounts
  cifs: Fix kernel oops when traceSMB is enabled
  CIFS: dump every session iface info
  CIFS: parse and store info on iface queries
  CIFS: add iface info to struct cifs_ses
  CIFS: complete PDU definitions for interface queries
  CIFS: move default port definitions to cifsglob.h
  cifs: Fix encryption/signing
  cifs: update __smb_send_rqst() to take an array of requests
  cifs: remove smb2_send_recv()
  cifs: push rfc1002 generation down the stack
  smb3: increase initial number of credits requested to allow write
  cifs: minor documentation updates
  cifs: add lease tracking to the cached root fid
  smb3: note that smb3.11 posix extensions mount option is experimental
2018-06-18 14:28:19 +09:00
Theodore Ts'o
bfe0a5f47a ext4: add more mount time checks of the superblock
The kernel's ext4 mount-time checks were more permissive than
e2fsprogs's libext2fs checks when opening a file system.  The
superblock is considered too insane for debugfs or e2fsck to operate
on it, the kernel has no business trying to mount it.

This will make file system fuzzing tools work harder, but the failure
cases that they find will be more useful and be easier to evaluate.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-17 18:11:20 -04:00
Theodore Ts'o
c37e9e0134 ext4: add more inode number paranoia checks
If there is a directory entry pointing to a system inode (such as a
journal inode), complain and declare the file system to be corrupted.

Also, if the superblock's first inode number field is too small,
refuse to mount the file system.

This addresses CVE-2018-10882.

https://bugzilla.kernel.org/show_bug.cgi?id=200069

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-17 00:41:14 -04:00
Theodore Ts'o
8bc1379b82 ext4: avoid running out of journal credits when appending to an inline file
Use a separate journal transaction if it turns out that we need to
convert an inline file to use an data block.  Otherwise we could end
up failing due to not having journal credits.

This addresses CVE-2018-10883.

https://bugzilla.kernel.org/show_bug.cgi?id=200071

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-16 23:41:59 -04:00
Theodore Ts'o
e09463f220 jbd2: don't mark block as modified if the handle is out of credits
Do not set the b_modified flag in block's journal head should not
until after we're sure that jbd2_journal_dirty_metadat() will not
abort with an error due to there not being enough space reserved in
the jbd2 handle.

Otherwise, future attempts to modify the buffer may lead a large
number of spurious errors and warnings.

This addresses CVE-2018-10883.

https://bugzilla.kernel.org/show_bug.cgi?id=200071

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-16 20:21:45 -04:00
Linus Torvalds
5e7b9212a4 Solve a series of broken links for files under Documentation:
- can.rst: fix a footnote reference;
 - crypto_engine.rst: Fix two parsing warnings;
 - Fix a lot of broken references to Documentation/*;
 - Improves the scripts/documentation-file-ref-check script,
   in order to help detecting/fixing broken references,
   preventing false-positives.
 
 After this patch series, only 33 broken references to doc files are
 detected by scripts/documentation-file-ref-check.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbJC2aAAoJEAhfPr2O5OEVPmMP/2rN5m9LZ048oRWlg4hCwo73
 4FpWqDg18hbWCMHXYHIN1UACIMUkIUfgLhF7WE3D/XqRMuxHoiE5u7DUdak7+VNt
 wunpksKFJbgyfFMHRvykHcZV+jQFVbM7eFvXVPIvoSaAeGH6zx4imHTyeDn3x/nL
 gdtBqM4bvEhmBjotBTRR4PB8+oPrT/HIT5npHepx3UnFFFAzDQGEZ/I67/el2G5C
 pVmYdBXvr7iqrvUs6FilHLTEfe1quCI4UaKNfLHKrxXrTkiJQFOwugYuobZfNmxT
 GwjWzfpNy9HMlKJFYipcByALxel1Mnpqz5mIxFQaCTygBuEsORCWzW5MoKIsIUJ0
 KOoG76v0rUyMvLBRvaoao3CHYHdzxhQbtVV9DjyDuDksa2G5IoCAF1t6DyIOitRw
 9plMnGckk+FJ/MXJKYWXHszFS8NhI0SF2zHe3s1DmRTD8P6oxkxvxBFz6iqqADmL
 W6XHd8CcqJItaS9ctPen91TFuysN1HFpdzLLY+xwWmmKOcWC/jFjhTm8pj7xLQHM
 5yuuEcefsajf+Xk4w2fSQmRfXnuq+oOlPuWpwSvEy+59cHGI0ms18P1nHy/yt3II
 CJywwdx6fjwDon57RFKH7kkGd7px317zMqWdIv9gUj/qZAy9gcdLdvEQLhx9u0aV
 4F+hLKFDFEpf58xqRT1R
 =/ozx
 -----END PGP SIGNATURE-----

Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental

Pull documentation fixes from Mauro Carvalho Chehab:
 "This solves a series of broken links for files under Documentation,
  and improves a script meant to detect such broken links (see
  scripts/documentation-file-ref-check).

  The changes on this series are:

   - can.rst: fix a footnote reference;

   - crypto_engine.rst: Fix two parsing warnings;

   - Fix a lot of broken references to Documentation/*;

   - improve the scripts/documentation-file-ref-check script, in order
     to help detecting/fixing broken references, preventing
     false-positives.

  After this patch series, only 33 broken references to doc files are
  detected by scripts/documentation-file-ref-check"

* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
  fix a series of Documentation/ broken file name references
  Documentation: rstFlatTable.py: fix a broken reference
  ABI: sysfs-devices-system-cpu: remove a broken reference
  devicetree: fix a series of wrong file references
  devicetree: fix name of pinctrl-bindings.txt
  devicetree: fix some bindings file names
  MAINTAINERS: fix location of DT npcm files
  MAINTAINERS: fix location of some display DT bindings
  kernel-parameters.txt: fix pointers to sound parameters
  bindings: nvmem/zii: Fix location of nvmem.txt
  docs: Fix more broken references
  scripts/documentation-file-ref-check: check tools/*/Documentation
  scripts/documentation-file-ref-check: get rid of false-positives
  scripts/documentation-file-ref-check: hint: dash or underline
  scripts/documentation-file-ref-check: add a fix logic for DT
  scripts/documentation-file-ref-check: accept more wildcards at filenames
  scripts/documentation-file-ref-check: fix help message
  media: max2175: fix location of driver's companion documentation
  media: v4l: fix broken video4linux docs locations
  media: dvb: point to the location of the old README.dvb-usb file
  ...
2018-06-17 05:25:18 +09:00
Linus Torvalds
dbb2816fc7 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlsielwACgkQnJ2qBz9k
 QNlN0Af/Q82iP3EqrT3w+CT7w0gER2su+Df2riDpo0/XYRQLxuyW+kYtLsQwovvB
 Q7Tt+WTSO5OIqoJxwGMmd6VO5ICblhP+uHVC6+JlWy17DgccjwFBE/sUopxPqJaK
 9utwXZhqqOEoikNpDABcptNnWVILRl0yppkQrVV/pKkyZFp2F8vO4roUHFFYkJJt
 /uXJfLDQx6pBLTwqfQBFyiz0dCSsvCHUVnlw7Hu5JfE6xPtkMlk6F/M0Y0rvyEOg
 8KmH5jUX/BXKIijg+ycOzS3CCdvm0UhrtiH5YWy4qGaI8eczT31Epfl08Sk8pvkv
 n2rnxNnJP5sjPPNQhXvHJqy9qRCB6g==
 =bLjN
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "fsnotify cleanups unifying handling of different watch types.

  This is the shortened fsnotify series from Amir with the last five
  patches pulled out. Amir has modified those patches to not change
  struct inode but obviously it's too late for those to go into this
  merge window"

* tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: add fsnotify_add_inode_mark() wrappers
  fanotify: generalize fanotify_should_send_event()
  fsnotify: generalize send_to_group()
  fsnotify: generalize iteration of marks by object type
  fsnotify: introduce marks iteration helpers
  fsnotify: remove redundant arguments to handle_event()
  fsnotify: use type id to identify connector object type
2018-06-17 05:06:18 +09:00
Theodore Ts'o
8cdb5240ec ext4: never move the system.data xattr out of the inode body
When expanding the extra isize space, we must never move the
system.data xattr out of the inode body.  For performance reasons, it
doesn't make any sense, and the inline data implementation assumes
that system.data xattr is never in the external xattr block.

This addresses CVE-2018-10880

https://bugzilla.kernel.org/show_bug.cgi?id=200005

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-16 15:40:48 -04:00
Linus Torvalds
35773c9381 Merge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "Assorted AFS stuff - ended up in vfs.git since most of that consists
  of David's AFS-related followups to Christoph's procfs series"

* 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Optimise callback breaking by not repeating volume lookup
  afs: Display manually added cells in dynamic root mount
  afs: Enable IPv6 DNS lookups
  afs: Show all of a server's addresses in /proc/fs/afs/servers
  afs: Handle CONFIG_PROC_FS=n
  proc: Make inline name size calculation automatic
  afs: Implement network namespacing
  afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
  afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
  proc: Add a way to make network proc files writable
  afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
  afs: Rearrange fs/afs/proc.c to move the show routines up
  afs: Rearrange fs/afs/proc.c by moving fops and open functions down
  afs: Move /proc management functions to the end of the file
2018-06-16 16:32:04 +09:00
Linus Torvalds
29d6849d88 Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat updates from Al Viro:
 "Some biarch patches - getting rid of assorted (mis)uses of
  compat_alloc_user_space().

  Not much in that area this cycle..."

* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: simplify compat ioctl handling
  signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
  vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
2018-06-16 16:21:50 +09:00
Linus Torvalds
a5b729ea18 Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio fixes from Al Viro:
 "Assorted AIO followups and fixes"

* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  eventpoll: switch to ->poll_mask
  aio: only return events requested in poll_mask() for IOCB_CMD_POLL
  eventfd: only return events requested in poll_mask()
  aio: mark __aio_sigset::sigmask const
2018-06-16 16:11:40 +09:00
Paulo Alcantara
83ffdeadb4 cifs: Fix invalid check in __cifs_calc_signature()
The following check would never evaluate to true:
  > if (i == 0 && iov[0].iov_len <= 4)

Because 'i' always starts at 1.

This patch fixes it and also move the header checks outside the for loop
- which makes more sense.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 19:17:40 -05:00
Paulo Alcantara
35e2cc1ba7 cifs: Use correct packet length in SMB2_TRANSFORM header
In smb3_init_transform_rq(), 'orig_len' was only counting the request
length, but forgot to count any data pages in the request.

Writing or creating files with the 'seal' mount option was broken.

In addition, do some code refactoring by exporting smb2_rqst_len() to
calculate the appropriate packet size and avoid duplicating the same
calculation all over the code.

The start of the io vector is either the rfc1002 length (4 bytes) or a
SMB2 header which is always > 4. Use this fact to check and skip the
rfc1002 length if requested.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 19:17:40 -05:00
Mauro Carvalho Chehab
44348e8ac1 fix a series of Documentation/ broken file name references
As files move around, their previous links break. Fix the
references for them.

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-15 18:10:01 -03:00
Mauro Carvalho Chehab
34962fb807 docs: Fix more broken references
As we move stuff around, some doc references are broken. Fix some of
them via this script:
	./scripts/documentation-file-ref-check --fix

Manually checked that produced results are valid.

Acked-by: Matthias Brugger <matthias.bgg@gmail.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-15 18:11:26 -03:00
Theodore Ts'o
6e8ab72a81 ext4: clear i_data in ext4_inode_info when removing inline data
When converting from an inode from storing the data in-line to a data
block, ext4_destroy_inline_data_nolock() was only clearing the on-disk
copy of the i_blocks[] array.  It was not clearing copy of the
i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually
used by ext4_map_blocks().

This didn't matter much if we are using extents, since the extents
header would be invalid and thus the extents could would re-initialize
the extents tree.  But if we are using indirect blocks, the previous
contents of the i_blocks array will be treated as block numbers, with
potentially catastrophic results to the file system integrity and/or
user data.

This gets worse if the file system is using a 1k block size and
s_first_data is zero, but even without this, the file system can get
quite badly corrupted.

This addresses CVE-2018-10881.

https://bugzilla.kernel.org/show_bug.cgi?id=200015

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-15 12:28:16 -04:00
Theodore Ts'o
bdbd6ce01a ext4: include the illegal physical block in the bad map ext4_error msg
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-15 12:27:16 -04:00
David Howells
47ea0f2ebf afs: Optimise callback breaking by not repeating volume lookup
At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.

However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume.  This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.

At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell.  This scan is done for every FID.

Optimise callback breaking by the following means:

 (1) Sort the FID list by vid so that all FIDs belonging to the same volume
     are clumped together.

     This is done through the use of an indirection table as we cannot do
     an insertion sort on the afs_callback_break array as we decode FIDs
     into it as we subsequently also have to decode callback info into it
     that corresponds by array index only.

     We also don't really want to bubblesort afterwards if we can avoid it.

 (2) Sort the server->cb_interests array by vid so that all the matching
     volumes are grouped together.  This permits the scan to stop after
     finding a record that has a higher vid.

 (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
     possible, caching the start point in the array for that volume group
     as long as possible.

     It might make sense to add another layer in that list and have a
     refcounted volume ID anchor that has the matching interests attached
     to it rather than being in the list.  This would allow the lock to be
     dropped without losing the cursor.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
David Howells
0da0b7fd73 afs: Display manually added cells in dynamic root mount
Alter the dynroot mount so that cells created by manipulation of
/proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
cell as a module parameter will cause directories for those cells to be
created in the dynamic root superblock for the network namespace[*].

To this end:

 (1) Only one dynamic root superblock is now created per network namespace
     and this is shared between all attempts to mount it.  This makes it
     easier to find the superblock to modify.

 (2) When a dynamic root superblock is created, the list of cells is walked
     and directories created for each cell already defined.

 (3) When a new cell is added, if a dynamic root superblock exists, a
     directory is created for it.

 (4) When a cell is destroyed, the directory is removed.

 (5) These directories are created by calling lookup_one_len() on the root
     dir which automatically creates them if they don't exist.

[*] Inasmuch as network namespaces are currently supported here.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
David Howells
c88d5a7fff afs: Enable IPv6 DNS lookups
Remove the restriction on DNS lookup upcalls that prevents ipv6 addresses
from being looked up.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
Steve French
d819d298c7 smb3: fix corrupt path in subdirs on smb311 with posix
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Steve French
115d5d288d smb3: do not display empty interface list
If server does not support listing interfaces then do not
display empty "Server interfaces" line to avoid confusing users.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Aurelien Aptel <aaptel@suse.com>
2018-06-15 02:38:08 -05:00
Steve French
bea851b8ba smb3: Fix mode on mkdir on smb311 mounts
mkdir was not passing the mode on smb3.11 mounts with posix extensions

Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Paulo Alcantara
662bf5bc0a cifs: Fix kernel oops when traceSMB is enabled
When traceSMB is enabled through 'echo 1 > /proc/fs/cifs/traceSMB', after a
mount, the following oops is triggered:

[   27.137943] BUG: unable to handle kernel paging request at
ffff8800f80c268b
[   27.143396] PGD 2c6b067 P4D 2c6b067 PUD 0
[   27.145386] Oops: 0000 [#1] SMP PTI
[   27.146186] CPU: 2 PID: 2655 Comm: mount.cifs Not tainted 4.17.0+ #39
[   27.147174] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.0.0-prebuilt.qemu-project.org 04/01/2014
[   27.148969] RIP: 0010:hex_dump_to_buffer+0x413/0x4b0
[   27.149738] Code: 48 8b 44 24 08 31 db 45 31 d2 48 89 6c 24 18 44 89
6c 24 24 48 c7 c1 78 b5 23 82 4c 89 64 24 10 44 89 d5 41 89 dc 4c 8d 58
02 <44> 0f b7 00 4d 89 dd eb 1f 83 c5 01 41 01 c4 41 39 ef 0f 84 48 fe
[   27.152396] RSP: 0018:ffffc9000058f8c0 EFLAGS: 00010246
[   27.153129] RAX: ffff8800f80c268b RBX: 0000000000000000 RCX:
ffffffff8223b578
[   27.153867] RDX: 0000000000000000 RSI: ffffffff81a55496 RDI:
0000000000000008
[   27.154612] RBP: 0000000000000000 R08: 0000000000000020 R09:
0000000000000083
[   27.155355] R10: 0000000000000000 R11: ffff8800f80c268d R12:
0000000000000000
[   27.156101] R13: 0000000000000002 R14: ffffc9000058f94d R15:
0000000000000008
[   27.156838] FS:  00007f1693a6b740(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[   27.158354] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.159093] CR2: ffff8800f80c268b CR3: 00000000798fa001 CR4:
0000000000360ee0
[   27.159892] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   27.160661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   27.161464] Call Trace:
[   27.162123]  print_hex_dump+0xd3/0x160
[   27.162814] journal-offline (2658) used greatest stack depth: 13144
bytes left
[   27.162824]  ? __release_sock+0x60/0xd0
[   27.165344]  ? tcp_sendmsg+0x31/0x40
[   27.166177]  dump_smb+0x39/0x40
[   27.166972]  ? vsnprintf+0x236/0x490
[   27.167807]  __smb_send_rqst.constprop.12+0x103/0x430
[   27.168554]  ? apic_timer_interrupt+0xa/0x20
[   27.169306]  smb_send_rqst+0x48/0xc0
[   27.169984]  cifs_send_recv+0xda/0x420
[   27.170639]  SMB2_negotiate+0x23d/0xfa0
[   27.171301]  ? vsnprintf+0x236/0x490
[   27.171961]  ? smb2_negotiate+0x19/0x30
[   27.172586]  smb2_negotiate+0x19/0x30
[   27.173257]  cifs_negotiate_protocol+0x70/0xd0
[   27.173935]  ? kstrdup+0x43/0x60
[   27.174551]  cifs_get_smb_ses+0x295/0xbe0
[   27.175260]  ? lock_timer_base+0x67/0x80
[   27.175936]  ? __internal_add_timer+0x1a/0x50
[   27.176575]  ? add_timer+0x10f/0x230
[   27.177267]  cifs_mount+0x101/0x1190
[   27.177940]  ? cifs_smb3_do_mount+0x144/0x5c0
[   27.178575]  cifs_smb3_do_mount+0x144/0x5c0
[   27.179270]  mount_fs+0x35/0x150
[   27.179930]  vfs_kern_mount.part.28+0x54/0xf0
[   27.180567]  do_mount+0x5ad/0xc40
[   27.181234]  ? kmem_cache_alloc_trace+0xed/0x1a0
[   27.181916]  ksys_mount+0x80/0xd0
[   27.182535]  __x64_sys_mount+0x21/0x30
[   27.183220]  do_syscall_64+0x4e/0x100
[   27.183882]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   27.184535] RIP: 0033:0x7f169339055a
[   27.185192] Code: 48 8b 0d 41 d9 2b 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0e d9 2b 00 f7 d8 64 89 01 48
[   27.187268] RSP: 002b:00007fff7b44eb58 EFLAGS: 00000202 ORIG_RAX:
00000000000000a5
[   27.188515] RAX: ffffffffffffffda RBX: 00007f1693a7e70e RCX:
00007f169339055a
[   27.189244] RDX: 000055b9f97f64e5 RSI: 000055b9f97f652c RDI:
00007fff7b45074f
[   27.189974] RBP: 000055b9fb8c9260 R08: 000055b9fb8ca8f0 R09:
0000000000000000
[   27.190721] R10: 0000000000000000 R11: 0000000000000202 R12:
000055b9fb8ca8f0
[   27.191429] R13: 0000000000000000 R14: 00007f1693a7c000 R15:
00007f1693a7e91d
[   27.192167] Modules linked in:
[   27.192797] CR2: ffff8800f80c268b
[   27.193435] ---[ end trace 67404c618badf323 ]---

The problem was that dump_smb() had been called with an invalid pointer,
that is, in __smb_send_rqst(), iov[1] doesn't exist (n_vec == 1).

This patch fixes it by relying on the n_vec value to dump out the smb
packets.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-06-15 02:38:08 -05:00
Aurelien Aptel
bc0fe8b207 CIFS: dump every session iface info
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Aurelien Aptel
fe856be475 CIFS: parse and store info on iface queries
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Aurelien Aptel
b6f0dd5d75 CIFS: add iface info to struct cifs_ses
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Aurelien Aptel
bead042ccc CIFS: complete PDU definitions for interface queries
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Aurelien Aptel
e2292430c4 CIFS: move default port definitions to cifsglob.h
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Paulo Alcantara
cd2dca60be cifs: Fix encryption/signing
Since the rfc1002 generation was moved down to __smb_send_rqst(),
the transform header is now in rqst->rq_iov[0].

Correctly assign the transform header pointer in crypt_message().

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:08 -05:00
Ronnie Sahlberg
07cd952f3a cifs: update __smb_send_rqst() to take an array of requests
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-06-15 02:38:08 -05:00
Ronnie Sahlberg
40eff45b5d cifs: remove smb2_send_recv()
Now that we have the plumbing to pass request without an rfc1002
header all the way down to the point we write to the socket we no
longer need the smb2_send_recv() function.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-06-15 02:38:08 -05:00
Ronnie Sahlberg
c713c8770f cifs: push rfc1002 generation down the stack
Move the generation of the 4 byte length field down the stack and
generate it immediately before we start writing the data to the socket.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-06-15 02:38:08 -05:00
Steve French
d409014e4f smb3: increase initial number of credits requested to allow write
Compared to other clients the Linux smb3 client ramps up
credits very slowly, taking more than 128 operations before a
maximum size write could be sent (since the number of credits
requested is only 2 per small operation, causing the credit
limit to grow very slowly).

This lack of credits initially would impact large i/o performance,
when large i/o is tried early before enough credits are built up.

Signed-off-by: Steve French <stfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-06-15 02:38:08 -05:00
Ronnie Sahlberg
a93864d939 cifs: add lease tracking to the cached root fid
Use a read lease for the cached root fid so that we can detect
when the content of the directory changes (via a break) at which time
we close the handle. On next access to the root the handle will be reopened
and cached again.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:07 -05:00
Steve French
2fbb56446f smb3: note that smb3.11 posix extensions mount option is experimental
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-06-15 02:38:07 -05:00
David Howells
0aac4bce4b afs: Show all of a server's addresses in /proc/fs/afs/servers
Show all of a server's addresses in /proc/fs/afs/servers, placing the
second plus addresses on padded lines of their own.  The current address is
marked with a star.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 00:52:59 -04:00
David Howells
b6cfbecafb afs: Handle CONFIG_PROC_FS=n
The AFS filesystem depends at the moment on /proc for configuration and
also presents information that way - however, this causes a compilation
failure if procfs is disabled.

Fix it so that the procfs bits aren't compiled in if procfs is disabled.

This means that you can't configure the AFS filesystem directly, but it is
still usable provided that an up-to-date keyutils is installed to look up
cells by SRV or AFSDB DNS records.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 00:52:55 -04:00
David Howells
24074a35c5 proc: Make inline name size calculation automatic
Make calculation of the size of the inline name in struct proc_dir_entry
automatic, rather than having to manually encode the numbers and failing to
allow for lockdep.

Require a minimum inline name size of 33+1 to allow for names that look
like two hex numbers with a dash between.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15 00:48:57 -04:00
Al Viro
430ff79170 orangefs: simplify compat ioctl handling
no need to mess with copy_in_user(), etc...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15 00:23:55 -04:00
Al Viro
5ed0127fc3 signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15 00:23:50 -04:00
Ben Noordhuis
11c5ad0ec4 eventpoll: switch to ->poll_mask
Signed-off-by: Ben Noordhuis <info@bnoordhuis.nl>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-14 20:09:28 -04:00
Christoph Hellwig
2739b807b0 aio: only return events requested in poll_mask() for IOCB_CMD_POLL
The ->poll_mask() operation has a mask of events that the caller
is interested in, but not all implementations might take it into
account.  Mask the return value to only the requested events,
similar to what the poll and epoll code does.

Reported-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-14 20:08:14 -04:00
Avi Kivity
4d572d9f46 eventfd: only return events requested in poll_mask()
The ->poll_mask() operation has a mask of events that the caller
is interested in, but we're returning all events regardless.

Change to return only the events the caller is interested in. This
fixes aio IO_CMD_POLL returning immediately when called with POLLIN
on an eventfd, since an eventfd is almost always ready for a write.

Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-14 20:07:38 -04:00
Linus Torvalds
b5d903c2d6 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - MM remainders

 - various misc things

 - kcov updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  lib/test_printf.c: call wait_for_random_bytes() before plain %p tests
  hexagon: drop the unused variable zero_page_mask
  hexagon: fix printk format warning in setup.c
  mm: fix oom_kill event handling
  treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX
  mm: use octal not symbolic permissions
  ipc: use new return type vm_fault_t
  sysvipc/sem: mitigate semnum index against spectre v1
  fault-injection: reorder config entries
  arm: port KCOV to arm
  sched/core / kcov: avoid kcov_area during task switch
  kcov: prefault the kcov_area
  kcov: ensure irq code sees a valid area
  kernel/relay.c: change return type to vm_fault_t
  exofs: avoid VLA in structures
  coredump: fix spam with zero VMA process
  fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()
  proc: skip branch in /proc/*/* lookup
  mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
  mm/memblock: add missing include <linux/bootmem.h>
  ...
2018-06-15 08:51:42 +09:00
Kees Cook
20fe935358 exofs: avoid VLA in structures
On the quest to remove all VLAs from the kernel[1] this adjusts several
cases where allocation is made after an array of structures that points
back into the allocation.  The allocations are changed to perform
explicit calculations instead of using a Variable Length Array in a
structure.

Additionally, this lets Clang compile this code now, since Clang does
not support VLAIS[2].

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://lkml.kernel.org/r/CA+55aFy6h1c3_rP_bXFedsTXzwW+9Q9MfJaW7GUmMBrAp-fJ9A@mail.gmail.com

[keescook@chromium.org: v2]
  Link: http://lkml.kernel.org/r/20180418163546.GA45794@beast
Link: http://lkml.kernel.org/r/20180327203904.GA1151@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:24 +09:00
Alexey Dobriyan
86a2bb5ad8 coredump: fix spam with zero VMA process
Nobody ever tried to self destruct by unmapping whole address space at
once:

	munmap((void *)0, (1ULL << 47) - 4096);

Doing this produces 2 warnings for zero-length vmalloc allocations:

  a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14
  a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
	...
  a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
	...

Fix is to switch to kvmalloc().

Steps to reproduce:

	// vsyscall=none
	#include <sys/mman.h>
	#include <sys/resource.h>
	int main(void)
	{
		setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY});
		munmap((void *)0, (1ULL << 47) - 4096);
		return 0;
	}

Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:24 +09:00
OGAWA Hirofumi
c2574aaa5d fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()
If file size and FAT cluster chain is not matched (corrupted image), we
can hit BUG_ON(!phys) in __fat_get_block().

So, use fat_fs_error() instead.

[hirofumi@mail.parknet.co.jp: fix printk warning]
  Link: http://lkml.kernel.org/r/87po12aq5p.fsf@mail.parknet.co.jp
Link: http://lkml.kernel.org/r/874lilcu67.fsf@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:24 +09:00
Alexey Dobriyan
26b95137d6 proc: skip branch in /proc/*/* lookup
Code is structured like this:

	for ( ... p < last; p++) {
		if (memcmp == 0)
			break;
	}
	if (p >= last)
		ERROR
	OK

gcc doesn't see that if if lookup succeeds than post loop branch will
never be taken and skip it.

[akpm@linux-foundation.org: proc_pident_instantiate() no longer takes an inode*]
Link: http://lkml.kernel.org/r/20180423213954.GD9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15 07:55:24 +09:00
Linus Torvalds
7a932516f5 vfs/y2038: inode timestamps conversion to timespec64
This is a late set of changes from Deepa Dinamani doing an automated
 treewide conversion of the inode and iattr structures from 'timespec'
 to 'timespec64', to push the conversion from the VFS layer into the
 individual file systems.
 
 There were no conflicts between this and the contents of linux-next
 until just before the merge window, when we saw multiple problems:
 
 - A minor conflict with my own y2038 fixes, which I could address
   by adding another patch on top here.
 - One semantic conflict with late changes to the NFS tree. I addressed
   this by merging Deepa's original branch on top of the changes that
   now got merged into mainline and making sure the merge commit includes
   the necessary changes as produced by coccinelle.
 - A trivial conflict against the removal of staging/lustre.
 - Multiple conflicts against the VFS changes in the overlayfs tree.
   These are still part of linux-next, but apparently this is no longer
   intended for 4.18 [1], so I am ignoring that part.
 
 As Deepa writes:
 
   The series aims to switch vfs timestamps to use struct timespec64.
   Currently vfs uses struct timespec, which is not y2038 safe.
 
   The series involves the following:
   1. Add vfs helper functions for supporting struct timepec64 timestamps.
   2. Cast prints of vfs timestamps to avoid warnings after the switch.
   3. Simplify code using vfs timestamps so that the actual
      replacement becomes easy.
   4. Convert vfs timestamps to use struct timespec64 using a script.
      This is a flag day patch.
 
   Next steps:
   1. Convert APIs that can handle timespec64, instead of converting
      timestamps at the boundaries.
   2. Update internal data structures to avoid timestamp conversions.
 
 Thomas Gleixner adds:
 
   I think there is no point to drag that out for the next merge window.
   The whole thing needs to be done in one go for the core changes which
   means that you're going to play that catchup game forever. Let's get
   over with it towards the end of the merge window.
 
 [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
 MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
 L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
 Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
 LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
 nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
 wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
 c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
 tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
 WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
 A3HkjIBrKW5AgQDxfgvm
 =CZX2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
 "This is a late set of changes from Deepa Dinamani doing an automated
  treewide conversion of the inode and iattr structures from 'timespec'
  to 'timespec64', to push the conversion from the VFS layer into the
  individual file systems.

  As Deepa writes:

   'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
       timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
       becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
       This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
       timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

  Thomas Gleixner adds:

   'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  pstore: Remove bogus format string definition
  vfs: change inode times to use struct timespec64
  pstore: Convert internal records to timespec64
  udf: Simplify calls to udf_disk_stamp_to_time
  fs: nfs: get rid of memcpys for inode times
  ceph: make inode time prints to be long long
  lustre: Use long long type to print inode time
  fs: add timespec64_truncate()
2018-06-15 07:31:07 +09:00
Linus Torvalds
dc594c39f7 The main piece is a set of libceph changes that revamps how OSD
requests are aborted, improving CephFS ENOSPC handling and making
 "umount -f" actually work (Zheng and myself).  The rest is mostly
 mount option handling cleanups from Chengguang and assorted fixes
 from Zheng, Luis and Dongsheng.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbIkigAAoJEEp/3jgCEfOL3EUH/1s7Ib3FgFzG/SPPKISxZOGr
 ndZGg0rPT9mPIQ4rp6t0z/cDlMrluPmCK3sWrAPe//sZz9iZiuip+mCL0gUFXFNr
 1kL2xDKkJzGxtP3UlUvr5CC6bnxLdeBXJRBDLk/swtphuqArKndlbN/iLZnCZivT
 uJDk+vZTwNJ3UhQP4QdnOQLV60NYs+q4euTqbZF3+pDiRiONbxRfXC3adFsc8zL9
 zlie3CHPbrQHWMsfNvbfM3rBH1WhTwEssDm+IEFlKl19q9SKP2WPZfmBcE1pmZ58
 AhIMoNGdQha1FXS6N96kaPaqFgeysPnEPoyHDqLxsUMKqsvJlOEZsK1jujza4rE=
 =EfXm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The main piece is a set of libceph changes that revamps how OSD
  requests are aborted, improving CephFS ENOSPC handling and making
  "umount -f" actually work (Zheng and myself).

  The rest is mostly mount option handling cleanups from Chengguang and
  assorted fixes from Zheng, Luis and Dongsheng.

* tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits)
  rbd: flush rbd_dev->watch_dwork after watch is unregistered
  ceph: update description of some mount options
  ceph: show ino32 if the value is different with default
  ceph: strengthen rsize/wsize/readdir_max_bytes validation
  ceph: fix alignment of rasize
  ceph: fix use-after-free in ceph_statfs()
  ceph: prevent i_version from going back
  ceph: fix wrong check for the case of updating link count
  libceph: allocate the locator string with GFP_NOFAIL
  libceph: make abort_on_full a per-osdc setting
  libceph: don't abort reads in ceph_osdc_abort_on_full()
  libceph: avoid a use-after-free during map check
  libceph: don't warn if req->r_abort_on_full is set
  libceph: use for_each_request() in ceph_osdc_abort_on_full()
  libceph: defer __complete_request() to a workqueue
  libceph: move more code into __complete_request()
  libceph: no need to call flush_workqueue() before destruction
  ceph: flush pending works before shutdown super
  ceph: abort osd requests on force umount
  libceph: introduce ceph_osdc_abort_requests()
  ...
2018-06-15 07:24:58 +09:00
Linus Torvalds
e7655d2b25 for-4.18-part2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlsg598ACgkQxWXV+ddt
 WDtG1w//R9/nvAk5A5cbForTXNyxwXAQnz0t2/4Lh5igbfJloqoTZtr47Gvqsvy+
 DITU+3BPcyupBuUFLoeivPC+ruKOUmle27Vm62mdZRtyt96kiUiV/m1gbPhFy8lW
 DKHK8tvtIoZObo5oNGvRxiuJPjefiChYZMDHokB+MY5ZALSRaIW9opj2WsM+ZAt7
 g9CEjeitQcBY68CCpSEVBlQSz+BTZYEDFJGNCmGDxGhaBGZr/ganrkDJ75cG6U10
 LnOZ6LDHNxGMqUm4wnhfmpHtVcIJiBF+gOyTumBPtFSoLnBverl684xizglpoq6d
 fnUP8Y6XR9JA4OCZvo310yvX9nyqgb0H2h+APO0f7jRRcJo0QSKZ/qZR+XZCk3PU
 91HtBopcGs8gGQUkRdAE7TMCiIEzL1eNOXHvsiILCObq1i7iNCe7Dzx6M6Gfgep8
 a3IcoVmSw1DFpln2ZxTQw9viAib41iU46XHXz7W7rGPulF5QdXGo5ScORRG6HBLE
 nZsXdTkrCPMJyN2bIhU6YJOK9rb9TjD4lUtnvzT8t1CfUsxQT4AsJykaYr9BwF2D
 Z4rBruUAQ3OmvXJDfGG4T5YCAdPBN+xBcxCeyrDZSD0r6YPQGoF0dlvHmvP0J78D
 oGkD1bb/gjcvsPJxtTQ4QWEh0oqZiDfRt4qdgO46vhba0onzQFw=
 =X2zA
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - error handling fixup for one of the new ioctls from 1st pull

 - fix for device-replace that incorrectly uses inode pages and can mess
   up compressed extents in some cases

 - fiemap fix for reporting incorrect number of extents

 - vm_fault_t type conversion

* tag 'for-4.18-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: scrub: Don't use inode pages for device replace
  btrfs: change return type of btrfs_page_mkwrite to vm_fault_t
  Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero
  btrfs: Check error of btrfs_iget in btrfs_search_path_in_tree_user
2018-06-15 07:23:00 +09:00
Anna Schumaker
d5681f59ee NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
I was able to reproduce this pretty regularily using xfstests
generic/013 on NFS v4.0.

Reported-by: Ross Zwisler <Ross.Zwisler@linux.intel.com>
Fixes: 6c34265502 (NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab())
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-14 14:05:38 -04:00
Theodore Ts'o
bc890a6024 ext4: verify the depth of extent tree in ext4_find_extent()
If there is a corupted file system where the claimed depth of the
extent tree is -1, this can cause a massive buffer overrun leading to
sadness.

This addresses CVE-2018-10877.

https://bugzilla.kernel.org/show_bug.cgi?id=199417

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-14 12:55:10 -04:00
Arnd Bergmann
e264abeaf9 pstore: Remove bogus format string definition
The pstore conversion to timespec64 introduces its own method of passing
seconds into sscanf() and sprintf() type functions to work around the
timespec64 definition on 64-bit systems that redefine it to 'timespec'.

That hack is now finally getting removed, but that means we get a (harmless)
warning once both patches are merged:

fs/pstore/ram.c: In function 'ramoops_read_kmsg_hdr':
fs/pstore/ram.c:39:29: error: format '%ld' expects argument of type 'long int *', but argument 3 has type 'time64_t *' {aka 'long long int *'} [-Werror=format=]
 #define RAMOOPS_KERNMSG_HDR "===="
                             ^~~~~~
fs/pstore/ram.c:167:21: note: in expansion of macro 'RAMOOPS_KERNMSG_HDR'

This removes the pstore specific workaround and uses the same method that
we have in place for all other functions that print a timespec64.

Related to this, I found that the kasprintf() output contains an incorrect
nanosecond values for any number starting with zeroes, and I adapt the
format string accordingly.

Link: https://lkml.org/lkml/2018/5/19/115
Link: https://lkml.org/lkml/2018/5/16/1080
Fixes: 0f0d83b99ef7 ("pstore: Convert internal records to timespec64")
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-06-14 14:57:24 +02:00
Arnd Bergmann
15eefe2a99 Merge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into vfs-timespec64
Pull the timespec64 conversion from Deepa Dinamani:
 "The series aims to switch vfs timestamps to use
  struct timespec64. Currently vfs uses struct timespec,
  which is not y2038 safe.

  The flag patch applies cleanly. I've not seen the timestamps
  update logic change often. The series applies cleanly on 4.17-rc6
  and linux-next tip (top commit: next-20180517).

  I'm not sure how to merge this kind of a series with a flag patch.
  We are targeting 4.18 for this.
  Let me know if you have other suggestions.

  The series involves the following:
  1. Add vfs helper functions for supporting struct timepec64 timestamps.
  2. Cast prints of vfs timestamps to avoid warnings after the switch.
  3. Simplify code using vfs timestamps so that the actual
     replacement becomes easy.
  4. Convert vfs timestamps to use struct timespec64 using a script.
     This is a flag day patch.

  I've tried to keep the conversions with the script simple, to
  aid in the reviews. I've kept all the internal filesystem data
  structures and function signatures the same.

  Next steps:
  1. Convert APIs that can handle timespec64, instead of converting
     timestamps at the boundaries.
  2. Update internal data structures to avoid timestamp conversions."

I've pulled it into a branch based on top of the NFS changes that
are now in mainline, so I could resolve the non-obvious conflict
between the two while merging.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-06-14 14:54:00 +02:00
Theodore Ts'o
8844618d8a ext4: only look at the bg_flags field if it is valid
The bg_flags field in the block group descripts is only valid if the
uninit_bg or metadata_csum feature is enabled.  We were not
consistently looking at this field; fix this.

Also block group #0 must never have uninitialized allocation bitmaps,
or need to be zeroed, since that's where the root inode, and other
special inodes are set up.  Check for these conditions and mark the
file system as corrupted if they are detected.

This addresses CVE-2018-10876.

https://bugzilla.kernel.org/show_bug.cgi?id=199403

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-06-14 00:58:00 -04:00