Commit graph

46693 commits

Author SHA1 Message Date
Theodore Ts'o
6da22013bb Merge branch 'fscrypt' into origin 2016-11-13 22:02:22 -05:00
Theodore Ts'o
a2f6d9c4c0 Merge branch 'dax-4.10-iomap-pmd' into origin 2016-11-13 22:02:15 -05:00
Eric Biggers
a6e0891286 fscrypto: don't use on-stack buffer for key derivation
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page.  get_crypt_info() was using a stack buffer to hold the
output from the encryption operation used to derive the per-file key.
Fix it by using a heap buffer.

This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 21:56:25 -05:00
Eric Biggers
08ae877f4e fscrypto: don't use on-stack buffer for filename encryption
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page.  For short filenames, fname_encrypt() was encrypting a
stack buffer holding the padded filename.  Fix it by encrypting the
filename in-place in the output buffer, thereby making the temporary
buffer unnecessary.

This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 21:56:19 -05:00
David Gstir
9c4bb8a3a9 fscrypt: Let fs select encryption index/tweak
Avoid re-use of page index as tweak for AES-XTS when multiple parts of
same page are encrypted. This will happen on multiple (partial) calls of
fscrypt_encrypt_page on same page.
page->index is only valid for writeback pages.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 20:18:16 -05:00
David Gstir
0b93e1b94b fscrypt: Constify struct inode pointer
Some filesystems, such as UBIFS, maintain a const pointer for struct
inode.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 20:18:01 -05:00
David Gstir
7821d4dd45 fscrypt: Enable partial page encryption
Not all filesystems work on full pages, thus we should allow them to
hand partial pages to fscrypt for en/decryption.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:55:21 -05:00
David Gstir
b50f7b268b fscrypt: Allow fscrypt_decrypt_page() to function with non-writeback pages
Some filesystem might pass pages which do not have page->mapping->host
set to the encrypted inode. We want the caller to explicitly pass the
corresponding inode.

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:53:10 -05:00
David Gstir
1c7dcf69ee fscrypt: Add in-place encryption mode
ext4 and f2fs require a bounce page when encrypting pages. However, not
all filesystems will need that (eg. UBIFS). This is handled via a
flag on fscrypt_operations where a fs implementation can select in-place
encryption over using a bounce page (which is the default).

Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13 18:47:04 -05:00
Jan Kara
9484ab1bf4 dax: Introduce IOMAP_FAULT flag
Introduce a flag telling iomap operations whether they are handling a
fault or other IO. That may influence behavior wrt inode size and
similar things.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-10 10:26:50 +11:00
Ross Zwisler
190b5caad7 dax: remove "depends on BROKEN" from FS_DAX_PMD
Now that DAX PMD faults are once again working and are now participating in
DAX's radix tree locking scheme, allow their config option to be enabled.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:35:16 +11:00
Ross Zwisler
862f1b9d67 xfs: use struct iomap based DAX PMD fault path
Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and
improved dax_iomap_pmd_fault().  Also, now that it has no more users,
remove xfs_get_blocks_dax_fault().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:35:02 +11:00
Ross Zwisler
642261ac99 dax: add struct iomap based DAX PMD support
DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
locking.  This patch allows DAX PMDs to participate in the DAX radix tree
based locking scheme so that they can be re-enabled using the new struct
iomap based fault handlers.

There are currently three types of DAX 4k entries: 4k zero pages, 4k DAX
mappings that have an associated block allocation, and 4k DAX empty
entries.  The empty entries exist to provide locking for the duration of a
given page fault.

This patch adds three equivalent 2MiB DAX entries: Huge Zero Page (HZP)
entries, PMD DAX entries that have associated block allocations, and 2 MiB
DAX empty entries.

Unlike the 4k case where we insert a struct page* into the radix tree for
4k zero pages, for HZP we insert a DAX exceptional entry with the new
RADIX_DAX_HZP flag set.  This is because we use a single 2 MiB zero page in
every 2MiB hole mapping, and it doesn't make sense to have that same struct
page* with multiple entries in multiple trees.  This would cause contention
on the single page lock for the one Huge Zero Page, and it would break the
page->index and page->mapping associations that are assumed to be valid in
many other places in the kernel.

One difficult use case is when one thread is trying to use 4k entries in
radix tree for a given offset, and another thread is using 2 MiB entries
for that same offset.  The current code handles this by making the 2 MiB
user fall back to 4k entries for most cases.  This was done because it is
the simplest solution, and because the use of 2MiB pages is already
opportunistic.

If we were to try to upgrade from 4k pages to 2MiB pages for a given range,
we run into the problem of how we lock out 4k page faults for the entire
2MiB range while we clean out the radix tree so we can insert the 2MiB
entry.  We can solve this problem if we need to, but I think that the cases
where both 2MiB entries and 4K entries are being used for the same range
will be rare enough and the gain small enough that it probably won't be
worth the complexity.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:34:45 +11:00
Ross Zwisler
422476c464 dax: move put_(un)locked_mapping_entry() in dax.c
No functional change.

The static functions put_locked_mapping_entry() and
put_unlocked_mapping_entry() will soon be used in error cases in
grab_mapping_entry(), so move their definitions above this function.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:33:44 +11:00
Ross Zwisler
fa28f7296a dax: move RADIX_DAX_* defines to dax.h
The RADIX_DAX_* defines currently mostly live in fs/dax.c, with just
RADIX_DAX_ENTRY_LOCK being in include/linux/dax.h so it can be used in
mm/filemap.c.  When we add PMD support, though, mm/filemap.c will also need
access to the RADIX_DAX_PTE type so it can properly construct a 4k sized
empty entry.

Instead of shifting the defines between dax.c and dax.h as they are
individually used in other code, just move them wholesale to dax.h so
they'll be available when we need them.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:33:35 +11:00
Ross Zwisler
1550290b08 dax: dax_iomap_fault() needs to call iomap_end()
Currently iomap_end() doesn't do anything for DAX page faults for both ext2
and XFS.  ext2_iomap_end() just checks for a write underrun, and
xfs_file_iomap_end() checks to see if it needs to finish a delayed
allocation.  However, in the future iomap_end() calls might be needed to
make sure we have balanced allocations, locks, etc.  So, add calls to
iomap_end() with appropriate error handling to dax_iomap_fault().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:33:26 +11:00
Ross Zwisler
333ccc978e dax: add dax_iomap_sector() helper function
To be able to correctly calculate the sector from a file position and a
struct iomap there is a complex little bit of logic that currently happens
in both dax_iomap_actor() and dax_iomap_fault().  This will need to be
repeated yet again in the DAX PMD fault handler when it is added, so break
it out into a helper function.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:33:09 +11:00
Ross Zwisler
11c59c92f4 dax: correct dax iomap code namespace
The recently added DAX functions that use the new struct iomap data
structure were named iomap_dax_rw(), iomap_dax_fault() and
iomap_dax_actor().  These are actually defined in fs/dax.c, though, so
should be part of the "dax" namespace and not the "iomap" namespace.
Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor()
respectively.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:32:46 +11:00
Ross Zwisler
b9fde0462e dax: remove dax_pmd_fault()
dax_pmd_fault() is the old struct buffer_head + get_block_t based 2 MiB DAX
fault handler.  This fault handler has been disabled for several kernel
releases, and support for PMDs will be reintroduced using the struct iomap
interface instead.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:32:35 +11:00
Ross Zwisler
63e95b5c4f dax: coordinate locking for offsets in PMD range
DAX radix tree locking currently locks entries based on the unique
combination of the 'mapping' pointer and the pgoff_t 'index' for the entry.
This works for PTEs, but as we move to PMDs we will need to have all the
offsets within the range covered by the PMD to map to the same bit lock.
To accomplish this, for ranges covered by a PMD entry we will instead lock
based on the page offset of the beginning of the PMD entry.  The 'mapping'
pointer is still used in the same way.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:32:20 +11:00
Ross Zwisler
e3ad61c64a dax: consistent variable naming for DAX entries
No functional change.

Consistently use the variable name 'entry' instead of 'ret' for DAX radix
tree entries.  This was already happening in most of the code, so update
get_unlocked_mapping_entry(), grab_mapping_entry() and
dax_unlock_mapping_entry().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:32:12 +11:00
Ross Zwisler
aada54f980 dax: remove the last BUG_ON() from fs/dax.c
Don't take down the kernel if we get an invalid 'from' and 'length'
argument pair.  Just warn once and return an error.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:32:00 +11:00
Ross Zwisler
ce95ab0fa6 dax: make 'wait_table' global variable static
The global 'wait_table' variable is only used within fs/dax.c, and
generates the following sparse warning:

fs/dax.c:39:19: warning: symbol 'wait_table' was not declared. Should it be static?

Make it static so it has scope local to fs/dax.c, and to make sparse happy.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:31:44 +11:00
Ross Zwisler
03e0990fc8 ext2: remove support for DAX PMD faults
DAX PMD support was added via the following commit:

commit e7b1ea2ad6 ("ext2: huge page fault support")

I believe this path to be untested as ext2 doesn't reliably provide block
allocations that are aligned to 2MiB.  In my testing I've been unable to
get ext2 to actually fault in a PMD.  It always fails with a "pfn
unaligned" message because the sector returned by ext2_get_block() isn't
aligned.

I've tried various settings for the "stride" and "stripe_width" extended
options to mkfs.ext2, without any luck.

Since we can't reliably get PMDs, remove support so that we don't have an
untested code path that we may someday traverse when we happen to get an
aligned block allocation.  This should also make 4k DAX faults in ext2 a
bit faster since they will no longer have to call the PMD fault handler
only to get a response of VM_FAULT_FALLBACK.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:31:33 +11:00
Ross Zwisler
fa0d3fce7c dax: remove buffer_size_valid()
Now that ext4 properly sets bh.b_size when we call get_block() for a hole,
rely on that value and remove the buffer_size_valid() sanity check.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:31:14 +11:00
Ross Zwisler
547edce3ba ext4: tell DAX the size of allocation holes
When DAX calls _ext4_get_block() and the file offset points to a hole we
currently don't set bh->b_size.  This is current worked around via
buffer_size_valid() in fs/dax.c.

_ext4_get_block() has the hole size information from ext4_map_blocks(), so
populate bh->b_size so we can remove buffer_size_valid() in a later patch.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08 11:30:58 +11:00
Linus Torvalds
fb415f222c Fixes for some recent regressions including fallout from the vmalloc'd
stack change (after which we can no longer encrypt stuff on the stack).
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYHNwpAAoJECebzXlCjuG++DMP/3mUUAF09DfFR/EHl7knDT1f
 kZ53UVHYzr02w0wXfwxVLlp2H7TdSAufgsSvPT6qksA3eY7gL6nJ9zHkl+Nv5yCx
 y6vsFWjO1QEUWFOZWCKcmT2dAI3Ddt9IhK13pfZEKN1XKvK2zWB16HEVzSg6fR2K
 NwHlpMnQUI4HWThURzwTZb1M5YhxRCAnyiv8BTAAPjbEfzPzdL7j3jxwqtH8bOWp
 qIcDDvjC744b9zy0YuAEY/NyGBhYZPdM6gWsBBes1TRzBWUL9qsUYTWDJTmg/F1l
 Or0Jz7CUEN9uOHLGnkATPDc+eBg9YFV+bSsSnJu1/W4Er7dX1Af/lol79zEp/Zw1
 Snd9FelSPj3vxmYAFTCLnHRTRgsyiDhbbb7gVrzH9bxnCrRNR6p2kY018s1Cl9Td
 uWQoNNFQwwnYxWYEeZdO5PgX+pcgoCzhHACNk5oA93YaBE0GuLHHugwwIrYE8TM1
 1iY20sLC5lJcnPqxdgnoprZnnHMuL6rx5KRbvBeflNZ4huK2PIcPJyeB83XH6s12
 G67PjJ0rfWzSBF14O/ZtQA6he+kXvnH3pKqpNnaMiBxZZ2J8E1eQvrKTLLIwmtlP
 18KKJpZIzh7jTTZ/99nAMAt/BGw97P9TToLdnI8dCxYygHEaywpEYtcsE8IWFAvA
 3XkS5QdlJhhAaAUUYBXy
 =oPbZ
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
 "Fixes for some recent regressions including fallout from the vmalloc'd
  stack change (after which we can no longer encrypt stuff on the
  stack)"

* tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: Fix general protection fault in release_lock_stateid()
  svcrdma: backchannel cannot share a page for send and rcv buffers
  sunrpc: fix some missing rq_rbuffer assignments
  sunrpc: don't pass on-stack memory to sg_set_buf
  nfsd: move blocked lock handling under a dedicated spinlock
2016-11-04 20:12:10 -07:00
Linus Torvalds
46d7cbb2c4 Merge branch 'for-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from Chris Mason:
 "Some fixes that Dave Sterba collected.  We held off on these last week
  because I was focused on the memory corruption testing"

* 'for-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix WARNING in btrfs_select_ref_head()
  Btrfs: remove some no-op casts
  btrfs: pass correct args to btrfs_async_run_delayed_refs()
  btrfs: make file clone aware of fatal signals
  btrfs: qgroup: Prevent qgroup->reserved from going subzero
  Btrfs: kill BUG_ON in do_relocation
2016-11-04 20:08:16 -07:00
Linus Torvalds
bd30fac18f Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "Fix two more POSIX ACL bugs introduced in 4.8 and add a missing fsync
  during copy up to prevent possible data loss"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fsync after copy-up
  ovl: fix get_acl() on tmpfs
  ovl: update S_ISGID when setting posix ACLs
2016-11-04 20:03:14 -07:00
Chuck Lever
f46c445b79 nfsd: Fix general protection fault in release_lock_stateid()
When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
>    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
>    stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well

Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Fixes: 42691398be ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01 15:24:43 -04:00
Miklos Szeredi
641089c154 ovl: fsync after copy-up
Make sure the copied up file hits the disk before renaming to the final
destination.  If this is not done then the copy-up may corrupt the data in
the file in case of a crash.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
2016-10-31 14:42:14 +01:00
Miklos Szeredi
b93d4a0eb3 ovl: fix get_acl() on tmpfs
tmpfs doesn't have ->get_acl() because it only uses cached acls.

This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer
of the overlay.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 39a25b2b37 ("ovl: define ->get_acl() for overlay inodes")
Cc: <stable@vger.kernel.org> # v4.8
2016-10-31 14:42:14 +01:00
Miklos Szeredi
fd3220d37b ovl: update S_ISGID when setting posix ACLs
This change fixes xfstest generic/375, which failed to clear the
setgid bit in the following test case on overlayfs:

  touch $testfile
  chown 100:100 $testfile
  chmod 2755 $testfile
  _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Fixes: d837a49bd5 ("ovl: fix POSIX ACL setting")
Cc: <stable@vger.kernel.org> # v4.8
2016-10-31 14:42:14 +01:00
Linus Torvalds
2a26d99b25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Lots of fixes, mostly drivers as is usually the case.

   1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
      Khoroshilov.

   2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
      Pedersen.

   3) Don't put aead_req crypto struct on the stack in mac80211, from
      Ard Biesheuvel.

   4) Several uninitialized variable warning fixes from Arnd Bergmann.

   5) Fix memory leak in cxgb4, from Colin Ian King.

   6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

   7) Several VRF semantic fixes from David Ahern.

   8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

   9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

  10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

  11) Fix stale link state during failover in NCSCI driver, from Gavin
      Shan.

  12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

  13) Propvide proper handle when emitting notifications of filter
      deletes, from Jamal Hadi Salim.

  14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

  15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

  16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

  17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

  18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
      Leitner.

  19) Revert a netns locking change that causes regressions, from Paul
      Moore.

  20) Add recursion limit to GRO handling, from Sabrina Dubroca.

  21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

  22) Avoid accessing stale vxlan/geneve socket in data path, from
      Pravin Shelar"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
  geneve: avoid using stale geneve socket.
  vxlan: avoid using stale vxlan socket.
  qede: Fix out-of-bound fastpath memory access
  net: phy: dp83848: add dp83822 PHY support
  enic: fix rq disable
  tipc: fix broadcast link synchronization problem
  ibmvnic: Fix missing brackets in init_sub_crq_irqs
  ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
  Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
  net/mlx4_en: Save slave ethtool stats command
  net/mlx4_en: Fix potential deadlock in port statistics flow
  net/mlx4: Fix firmware command timeout during interrupt test
  net/mlx4_core: Do not access comm channel if it has not yet been initialized
  net/mlx4_en: Fix panic during reboot
  net/mlx4_en: Process all completions in RX rings after port goes up
  net/mlx4_en: Resolve dividing by zero in 32-bit system
  net/mlx4_core: Change the default value of enable_qos
  net/mlx4_core: Avoid setting ports to auto when only one port type is supported
  net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
  ...
2016-10-29 20:33:20 -07:00
Linus Torvalds
efa563752c This pull request contains fixes for issues in both UBI and UBIFS:
- A regression wrt. overlayfs, introduced in -rc2.
 - An UBI issue, found by Dan Carpenter's static checker.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYFPHWAAoJEEtJtSqsAOnWcK4P/AwBcqPa0em/HXrdCExanQXY
 8U3uCPbDua4sW1Eaw5dVFoZuVoPzhibLLaVoVIWs8LOXiD8v23VYQ8ezu0D0O9fc
 cAsrxg0MtQLF/hyyVbdihxaqCB2H/j9PDJdIdCiRindPEwm0k6KBkVMk3N8O3m2U
 xDSA+Oq8Ns5cgjx+yfOhMJbGOFUzky26SV/M+PTAIU9Sj2w7RJS9R18BtWv4EFoK
 q1sT8aEte3kryb+v/a4s9RNzWOOHqRvZ4XizOMvma9I6uX6hOU4oeLknmJx1gPnb
 U5z75uAVn+IeNRnrco3pD91N3X9hEtv4IgZhFafNseVTY9MirDX5ss4th+XrSM6y
 wKgWEC8UmcV9Y7zDV/towZjhCipIh1yJPu3493IVHB/1UDPoNDfOGpK6NuhIEZHy
 1sNY8F2j3BBnLw6Fc2uC1FxM3a9MQ9CgJWQ0y9src73VNgQ8miz1WH2rsFp5DwNu
 HdZGBXGElmhbJbNFSsRqC1j+K0Y2LzL5BVOrBblkJNpUmxufRx0LIdXE7p4tPazq
 8dVOH/Ktx+mDQFbtyA8vXK+Cyyp0c/snR3BZo3AWLfrlip6iwZPG6arN4Wu6P4Nl
 ZFWUlHKaMJS/lvsdAuCdZ/lawRvENTOEQMORJR8U7CX/7gDLV1KiaFRpB3fFDUW5
 xm5r2qsbVzElu6skk4xk
 =eOKJ
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs

Pull ubi/ubifs fixes from Richard Weinberger:
 "This contains fixes for issues in both UBI and UBIFS:

   - A regression wrt overlayfs, introduced in -rc2.
   - An UBI issue, found by Dan Carpenter's static checker"

* tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs:
  ubifs: Fix regression in ubifs_readdir()
  ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
2016-10-29 13:15:24 -07:00
Linus Torvalds
c636e176d8 driver core fixes for 4.9-rc3
Here are two small driver core / kernfs fixes for 4.9-rc3.  One makes
 the Kconfig entry for DEBUG_TEST_DRIVER_REMOVE a bit more explicit that
 this is a crazy thing to enable for a distro kernel (thanks for trying
 Fedora!), the other resolves an issue with vim opening kernfs files
 (sysfs, configfs, etc.).
 
 Both have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iFYEABECABYFAlgU0CMPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspgv4AoJhR
 YJeG57ReBKjlzAj497Z1X7QcAJ9GXcbbbxmwj2IcUln5I3uEyuPCkQ==
 =pS6k
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two small driver core / kernfs fixes for 4.9-rc3.

  One makes the Kconfig entry for DEBUG_TEST_DRIVER_REMOVE a bit more
  explicit that this is a crazy thing to enable for a distro kernel
  (thanks for trying Fedora!), the other resolves an issue with vim
  opening kernfs files (sysfs, configfs, etc.)

  Both have been in linux-next with no reported issues"

* tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Make Kconfig text for DEBUG_TEST_DRIVER_REMOVE stronger
  kernfs: Add noop_fsync to supported kernfs_file_fops
2016-10-29 10:57:40 -07:00
Linus Torvalds
f6167514c8 Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "My patch fixes the btrfs list_head abuse that we tracked down during
  Dave Jones' memory corruption investigation. With both Jens and my
  patches in place, I'm no longer able to trigger problems.

  Filipe is fixing a difficult old bug between snapshots, balance and
  send. Dave is cooking a few more for the next rc, but these are tested
  and ready"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix races on root_log_ctx lists
  btrfs: fix incremental send failure caused by balance
2016-10-28 10:07:35 -07:00
Richard Weinberger
a00052a296 ubifs: Fix regression in ubifs_readdir()
Commit c83ed4c9db ("ubifs: Abort readdir upon error") broke
overlayfs support because the fix exposed an internal error
code to VFS.

Reported-by: Peter Rosin <peda@axentia.se>
Tested-by: Peter Rosin <peda@axentia.se>
Reported-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Fixes: c83ed4c9db ("ubifs: Abort readdir upon error")
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-28 14:48:31 +02:00
Linus Torvalds
14970f204b Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "20 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
  cris/arch-v32: cryptocop: print a hex number after a 0x prefix
  ipack: print a hex number after a 0x prefix
  block: DAC960: print a hex number after a 0x prefix
  fs: exofs: print a hex number after a 0x prefix
  lib/genalloc.c: start search from start of chunk
  mm: memcontrol: do not recurse in direct reclaim
  CREDITS: update credit information for Martin Kepplinger
  proc: fix NULL dereference when reading /proc/<pid>/auxv
  mm: kmemleak: ensure that the task stack is not freed during scanning
  lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
  latent_entropy: raise CONFIG_FRAME_WARN by default
  kconfig.h: remove config_enabled() macro
  ipc: account for kmem usage on mqueue and msg
  mm/slab: improve performance of gathering slabinfo stats
  mm: page_alloc: use KERN_CONT where appropriate
  mm/list_lru.c: avoid error-path NULL pointer deref
  h8300: fix syscall restarting
  kcov: properly check if we are in an interrupt
  mm/slab: fix kmemcg cache creation delayed issue
2016-10-27 19:58:39 -07:00
Uwe Kleine-König
14f947c87a fs: exofs: print a hex number after a 0x prefix
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x.  So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@primarydata.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Leon Yu
06b2849d10 proc: fix NULL dereference when reading /proc/<pid>/auxv
Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL.  Fix that by checking for NULL mm
and bailing out early.  This is also the original behavior changed by
recent commit c531716785 ("proc: switch auxv to use of __mem_open()").

  # cat /proc/2/auxv
  Unable to handle kernel NULL pointer dereference at virtual address 000000a8
  Internal error: Oops: 17 [#1] PREEMPT SMP ARM
  CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
  Hardware name: BCM2709
  task: ea3b0b00 task.stack: e99b2000
  PC is at auxv_read+0x24/0x4c
  LR is at do_readv_writev+0x2fc/0x37c
  Process cat (pid: 113, stack limit = 0xe99b2210)
  Call chain:
    auxv_read
    do_readv_writev
    vfs_readv
    default_file_splice_read
    splice_direct_to_actor
    do_splice_direct
    do_sendfile
    SyS_sendfile64
    ret_fast_syscall

Fixes: c531716785 ("proc: switch auxv to use of __mem_open()")
Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Janis Danisevskis <jdanis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Linus Torvalds
e3300ffef0 orangefs: a couple of cleanups sent in by other developers
use d_fsdata instead of d_time
     Miklos Szeredi <mszeredi@redhat.com>
 
   use file_inode(file) instead of file->f_path.dentry->d_inode
     Amir Goldstein <amir73il@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJYD65MAAoJEM9EDqnrzg2+kaUP/0HPDYJyWSgbGVSKuNqOiyml
 VbAGRbDAcpyYCFww2cRO9Xvvh6bJmGEqZUUbNxgi3q5L2KnvvoQ0jkHFfHaVii53
 uWP0WGrxBcRNxv72jfo1cBxYTcTqEfzXZBQb6HhzfbjMCvejbhSbYDowElTE7Oar
 AwcgEdv0Utm7zD/0K+OW56Q4fUYzOSFI4c/tNGUyjQCLE+N3R2roXdivz3maEfee
 uDg262lfQgkzbEYGJOdt8MpUak6YEp2bFa+Xf8bRoKMze8KbVDLwuTlYXuSdc/i8
 e8QO/Zr+irX/jJ/Sc998FwGquUljPuxz4wHSNEVO3HqYFIe30zkUD0mqQcxqx6YD
 F4DhSn8Ok5PuKv5aw1Q7AMA0Zd+bKaJzb/E0JdlHn1n9PFMiod82rdTfmGxP1rZb
 BwuOW/dsp/RLBZhCYpkNTBiNAH+TSIp8M7eOavO68AZ2zJXN69e/Qv2iJsaAZJZ0
 of+i9I4kmXUS4F6OPjgT6xJbH4aD/X4/jei4dPKDATXM0MW+GsZ7VodAYmAqGGCO
 l66UoL4o11BCMJNGfsdPxWJkUgpn4OBb+RSkS0f6qQ7Nlp1OaYeRYKNbX5ICHcgj
 A0PHXZ8Pub3iVgX5xUrQmYk3txbLt0ISDYBXzfPZ0rreztN0o5FRB4TNVLC82VwJ
 XHBdehhgLsNc1PMKSzZo
 =b9Ly
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull oreangefs updates from Mike Marshall:
 "A couple of orangefs cleanups sent in by other developers:

   - use d_fsdata instead of d_time (Miklos Szeredi)

   - use file_inode(file) instead of file->f_path.dentry->d_inode (Amir
     Goldstein)"

* tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: don't use d_time
  orangefs: user file_inode() where it is due
2016-10-27 12:52:46 -07:00
Linus Torvalds
e890038e6a xfs: updates for 4.9-rc3
Changes in this update:
 o iomap page offset masking fix for page faults
 o add IOMAP_REPORT to distinguish between read and fiemap map requests
 o cleanups to new shared data extent code
 o fix mount active status on failed log recovery
 o fix broken dquots in a buffer calculation
 o fix locking order issues and merge xfs_reflink_remap_range and
   xfs_file_share_range
 o rework unmapping of CoW extents and remove now unused functions
 o clean state when CoW is done.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYEfdWAAoJEK3oKUf0dfoddg4P/0Tl/i58sBL/Um90kSGOjxjI
 yaOKuFImS3MFSYDwiYADnXdhq6BgVLUJWS07t9/P6Nn3OZr1wBCZDZdyRS1+JwAA
 qOui4sp/v21HprydscN+BAdxyYmuo4yFgu9lkFFSM55yiaAb5C8hsYKF42Gja1+m
 gS40/Lsa5nauSz58UOZ5oEljAvBldAdyMlk8rVSGXVm7+pqs7Lxmhjif/Y8y/Y+i
 097auIrGk+oRDukXqhtZyCQ7VP99WzM+ksajtrNwVOOzSMhrcDCHKuLe0i4LsyjN
 UTx1ioY/AD8PUYhSmLqALD9vtFHnJbx50/MQFHNLc+hDQb2jb/jQmqx9LyEYDt38
 sw/Wy55hh9PylILdE//bWH0vSgqmnNCWviBUzjDtAJ9FKfv19slFlwtu2K4lOHoq
 C6Q2uh2mB7BC6efksk9DeA6/N9tFQuiXa48sN5+D2zMfZAmdkgzDCKfGrpRnS1Yl
 4h+sfiK/DTf11Q2nTaPAHylt02SmHsikQWvb5Fxu76UI8k4RsjCZc3ep/NUNJBlU
 E8f+cdNlAF5k/AWBY7107N1iUqL/vS2wXLdburJkckmQqRcI5WuRaLhi9g4tFjFI
 o+m9EM1WuOP6jeOuVImwgCRJoLVnTVKwee/d4J8y9Ad//Rs6B9pB0SIDfxJa9LY6
 B1XjT8z/NVyK6GsfP1Qs
 =LDDu
 -----END PGP SIGNATURE-----

Merge tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
 "This update contains fixes for most of the outstanding regressions
  introduced with the 4.9-rc1 XFS merge. There is also a fix for an
  iomap bug, too.

  This is a quite a bit larger than I'd prefer for a -rc3, but most of
  the change comes from cleaning up the new reflink copy on write code;
  it's much simpler and easier to understand now. These changes fixed
  several bugs in the new code, and it wasn't clear that there was an
  easier/simpler way to fix them. The rest of the fixes are the usual
  size you'd expect at this stage.

  I've left the commits to soak in linux-next for a some extra time
  because of the size before asking you to pull, no new problems with
  them have been reported so I think it's all OK.

  Summary:
   - iomap page offset masking fix for page faults
   - add IOMAP_REPORT to distinguish between read and fiemap map
     requests
   - cleanups to new shared data extent code
   - fix mount active status on failed log recovery
   - fix broken dquots in a buffer calculation
   - fix locking order issues and merge xfs_reflink_remap_range and
     xfs_file_share_range
   - rework unmapping of CoW extents and remove now unused functions
   - clean state when CoW is done"

* tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits)
  xfs: clear cowblocks tag when cow fork is emptied
  xfs: fix up inode cowblocks tracking tracepoints
  fs: Do to trim high file position bits in iomap_page_mkwrite_actor
  xfs: remove xfs_bunmapi_cow
  xfs: optimize xfs_reflink_end_cow
  xfs: optimize xfs_reflink_cancel_cow_blocks
  xfs: refactor xfs_bunmapi_cow
  xfs: optimize writes to reflink files
  xfs: don't bother looking at the refcount tree for reads
  xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared
  xfs: add xfs_trim_extent
  iomap: add IOMAP_REPORT
  xfs: merge xfs_reflink_remap_range and xfs_file_share_range
  xfs: remove xfs_file_wait_for_io
  xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
  xfs: fix the same_inode check in xfs_file_share_range
  xfs: remove the same fs check from xfs_file_share_range
  libxfs: v3 inodes are only valid on crc-enabled filesystems
  libxfs: clean up _calc_dquots_per_chunk
  xfs: unset MS_ACTIVE if mount fails
  ...
2016-10-27 12:34:50 -07:00
Chris Mason
570dd45042 btrfs: fix races on root_log_ctx lists
btrfs_remove_all_log_ctxs takes a shortcut where it avoids walking the
list because it knows all of the waiters are patiently waiting for the
commit to finish.

But, there's a small race where btrfs_sync_log can remove itself from
the list if it finds a log commit is already done.  Also, it uses
list_del_init() to remove itself from the list, but there's no way to
know if btrfs_remove_all_log_ctxs has already run, so we don't know for
sure if it is safe to call list_del_init().

This gets rid of all the shortcuts for btrfs_remove_all_log_ctxs(), and
just calls it with the proper locking.

This is part two of the corruption fixed by cbd60aa7cd.  I should have
done this in the first place, but convinced myself the optimizations were
safe.  A 12 hour run of dbench 2048 will eventually trigger a list debug
WARN_ON for the list_del_init() in btrfs_sync_log().

Fixes: d1433debe7
Reported-by: Dave Jones <davej@codemonkey.org.uk>
cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Chris Mason <clm@fb.com>
2016-10-27 10:42:20 -07:00
Tony Luck
2a9becdd4d kernfs: Add noop_fsync to supported kernfs_file_fops
If you edit a kernfs backed file with vi(1), you see an ugly error
message when you write the file because vi tries to fsync(2) the
file after writing, which fails.

We have noop_fsync() for this, use it.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-27 17:47:11 +02:00
Linus Torvalds
272ddc8b37 proc: don't use FOLL_FORCE for reading cmdline and environment
Now that Lorenzo cleaned things up and made the FOLL_FORCE users
explicit, it becomes obvious how some of them don't really need
FOLL_FORCE at all.

So remove FOLL_FORCE from the proc code that reads the command line and
arguments from user space.

The mem_rw() function actually does want FOLL_FORCE, because gdd (and
possibly many other debuggers) use it as a much more convenient version
of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
conditional on actually being a ptracer.  This does not actually do
that, just moves adds a comment to that effect and moves the gup_flags
settings next to each other.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-24 19:00:44 -07:00
Jeff Layton
0cc11a61b8 nfsd: move blocked lock handling under a dedicated spinlock
Bruce was hitting some lockdep warnings in testing, showing that we
could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a
rather complex situation involving four different spinlocks.

The crux of the matter is that we end up taking the nn->client_lock in
the lm_notify handler. The simplest fix is to just declare a new
per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-24 16:51:21 -04:00
Miklos Szeredi
804b1737d7 orangefs: don't use d_time
Instead use d_fsdata which is the same size.  Hoping to get rid of d_time,
which is used by very few filesystems by this time.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-10-24 14:50:07 -04:00
Amir Goldstein
d62a9025ae orangefs: user file_inode() where it is due
Replace wrong use of file->f_path.dentry->d_inode with file_inode(file).
In case orangefs ever finds itself as an overelayfs layer, it would want
to get its own inode and not overlayfs's inode.

DISCLAIMER: I did not test this patch because I do not know how to setup
            an orangefs mount

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-10-24 14:29:39 -04:00
Wang Xiaoguang
9d1032cc49 btrfs: fix WARNING in btrfs_select_ref_head()
This issue was found when testing in-band dedupe enospc behaviour,
sometimes run_one_delayed_ref() may fail for enospc reason, then
__btrfs_run_delayed_refs()will return, but forget to add num_heads_read
back, which will trigger "WARN_ON(delayed_refs->num_heads_ready == 0)" in
btrfs_select_ref_head().

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-10-24 18:20:29 +02:00