Commit graph

35920 commits

Author SHA1 Message Date
J. Bruce Fields
f35ea0d4b6 Merge 3.15 bugfixes for 3.16 2014-05-22 15:48:15 -04:00
J. Bruce Fields
cbf7a75bc5 nfsd4: fix delegation cleanup on error
We're not cleaning up everything we need to on error.  In particular,
we're not removing our lease.  Among other problems this can cause the
struct nfs4_file used as fl_owner to be referenced after it has been
destroyed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 12:17:17 -04:00
Kinglong Mee
368fe39b50 NFSD: Don't clear SUID/SGID after root writing data
We're clearing the SUID/SGID bits on write by hand in nfsd_vfs_write,
even though the subsequent vfs_writev() call will end up doing this for
us (through file system write methods eventually calling
file_remove_suid(), e.g., from __generic_file_aio_write).

So, remove the redundant nfsd code.

The only change in behavior is when the write is by root, in which case
we previously cleared SUID/SGID, but will now leave it alone.  The new
behavior is the behavior of every filesystem we've checked.

It seems better to be consistent with local filesystem behavior.  And
the security advantage seems limited as root could always restore these
bits by hand if it wanted.

SUID/SGID is not cleared after writing data with (root, local ext4),
   File: ‘test’
   Size: 0               Blocks: 0          IO Block: 4096   regular
empty file
Device: 803h/2051d      Inode: 1200137     Links: 1
Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.016029014 +0800
Change: 2014-04-18 21:36:31.026030285 +0800
  Birth: -
   File: ‘test’
   Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: 803h/2051d      Inode: 1200137     Links: 1
Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.040032065 +0800
Change: 2014-04-18 21:36:31.040032065 +0800
  Birth: -

With no_root_squash, (root, remote ext4), SUID/SGID are cleared,
   File: ‘test’
   Size: 0               Blocks: 0          IO Block: 262144 regular
empty file
Device: 24h/36d Inode: 786439      Links: 1
Access: (4777/-rwsrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.155805097 +0800
Change: 2014-04-18 21:45:32.168806749 +0800
  Birth: -
   File: ‘test’
   Size: 5               Blocks: 8          IO Block: 262144 regular file
Device: 24h/36d Inode: 786439      Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.184808783 +0800
Change: 2014-04-18 21:45:32.184808783 +0800
  Birth: -

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 12:17:16 -04:00
J. Bruce Fields
27b11428b7 nfsd4: warn on finding lockowner without stateid's
The current code assumes a one-to-one lockowner<->lock stateid
correspondance.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 11:11:21 -04:00
J. Bruce Fields
a1b8ff4c97 nfsd4: remove lockowner when removing lock stateid
The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.

We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.

Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 11:11:21 -04:00
J. Bruce Fields
5513a510fa nfsd4: fix corruption on setting an ACL.
As of 06f9cc12ca "nfsd4: don't create
unnecessary mask acl", any non-trivial ACL will be left with an
unitialized entry, and a trivial ACL may write one entry beyond what's
allocated.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-15 15:36:04 -04:00
Kinglong Mee
9fa1959e97 NFSD: Get rid of empty function nfs4_state_init
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-08 14:59:52 -04:00
Kinglong Mee
f3e41ec5ef NFSD: Use simple_read_from_buffer for coping data to userspace
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-08 14:59:52 -04:00
J. Bruce Fields
dd15073a26 Merge 3.15 bugfix for 3.16 2014-05-08 14:59:06 -04:00
Christoph Hellwig
5409e46f1b nfsd: clean up fh_auth usage
Use fh_fsid when reffering to the fsid part of the filehandle.  The
variable length auth field envisioned in nfsfh wasn't ever implemented.
Also clean up some lose ends around this and document the file handle
format better.

Btw, why do we even export nfsfh.h to userspace?  The file handle very
much is kernel private, and nothing in nfs-utils include the header
either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-08 12:43:03 -04:00
Kinglong Mee
ecc7455d8e NFSD: cleanup unneeded including linux/export.h
commit 4ac7249ea5 have remove all EXPORT_SYMBOL,
linux/export.h is not needed, just clean it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-08 12:43:02 -04:00
Kinglong Mee
aa07c713ec NFSD: Call ->set_acl with a NULL ACL structure if no entries
After setting ACL for directory, I got two problems that caused
by the cached zero-length default posix acl.

This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
with a NULL ACL structure if there are no entries.

Thanks for Christoph Hellwig's advice.

First problem:
............ hang ...........

Second problem:
[ 1610.167668] ------------[ cut here ]------------
[ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
[ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
[last unloaded: nfsd]
[ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G           OE
3.15.0-rc1+ #15
[ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
ffff88005a944000
[ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>]  [<ffffffffa034d5ed>]
_posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
[ 1610.168320] RSP: 0018:ffff88005a945b00  EFLAGS: 00010293
[ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
0000000000000000
[ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
ffff880068233300
[ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
0000000000000000
[ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
ffff880068233300
[ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
ffff880068233300
[ 1610.168320] FS:  0000000000000000(0000) GS:ffff880077800000(0000)
knlGS:0000000000000000
[ 1610.168320] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
00000000000006f0
[ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1610.168320] Stack:
[ 1610.168320]  ffffffff00000000 0000000b67c83500 000000076700bac0
0000000000000000
[ 1610.168320]  ffff88006700bac0 ffff880068233300 ffff88005a945c08
0000000000000002
[ 1610.168320]  0000000000000000 ffff88005a945b88 ffffffffa034e2d5
000000065a945b68
[ 1610.168320] Call Trace:
[ 1610.168320]  [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
[ 1610.168320]  [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
[ 1610.168320]  [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
[ 1610.168320]  [<ffffffffa0327962>] ?
nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
[ 1610.168320]  [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
[ 1610.168320]  [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
[ 1610.168320]  [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
[nfsd]
[ 1610.168320]  [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
[ 1610.168320]  [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[ 1610.168320]  [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
[ 1610.168320]  [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
[ 1610.168320]  [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
[ 1610.168320]  [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1610.168320]  [<ffffffff810a5202>] kthread+0xd2/0xf0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320]  [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
[ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
[ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
[ 1610.168320] RIP  [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
[nfsd]
[ 1610.168320]  RSP <ffff88005a945b00>
[ 1610.257313] ---[ end trace 838254e3e352285b ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-08 12:42:21 -04:00
Trond Myklebust
14bcab1a39 NFSd: Clean up nfs4_preprocess_stateid_op
Move the state locking and file descriptor reference out from the
callers and into nfs4_preprocess_stateid_op() itself.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-07 11:05:48 -04:00
Trond Myklebust
50cc62317d NFSd: Mark nfs4_free_lockowner and nfs4_free_openowner as static functions
They do not need to be used outside fs/nfsd/nfs4state.c

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:57 -04:00
Christoph Hellwig
6f226e2ab1 nfsd: remove <linux/nfsd/debug.h>
There is almost nothing left it in, just merge it into the only file
that includes it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:56 -04:00
Christoph Hellwig
7f94423e8f nfsd: move <linux/nfsd/stats.h> to fs/nfsd
There are no legitimate users outside of fs/nfsd, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:55 -04:00
Christoph Hellwig
d430e8d530 nfsd: move <linux/nfsd/export.h> to fs/nfsd
There are no legitimate users outside of fs/nfsd, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:54 -04:00
Christoph Hellwig
9c69de4c94 nfsd: remove <linux/nfsd/nfsfh.h>
The only real user of this header is fs/nfsd/nfsfh.h, so merge the
two.  Various lockѕ source files used it to indirectly get other
sunrpc or nfs headers, so fix those up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:53 -04:00
Trond Myklebust
4dd86e150f NFSd: Remove 'inline' designation for free_client()
It is large, it is used in more than one place, and it is not performance
critical. Let gcc figure out whether it should be inlined...

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:53 -04:00
Kees Cook
12dd7ecf23 lockd: avoid warning when CONFIG_SYSCTL undefined
When building without CONFIG_SYSCTL, the compiler saw an unused
label. This moves the label into the #ifdef it is used under.

fs/lockd/svc.c: In function ‘init_nlm’:
fs/lockd/svc.c:626:1: warning: label ‘err_sysctl’ defined but not used [-Wunused-label]

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 17:54:52 -04:00
Trond Myklebust
4cb57e3032 NFSd: call rpc_destroy_wait_queue() from free_client()
Mainly to ensure that we don't leave any hanging timers.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 12:38:49 -04:00
Trond Myklebust
5694c93e6c NFSd: Move default initialisers from create_client() to alloc_client()
Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-06 12:38:46 -04:00
J. Bruce Fields
fc208d026b Revert "nfsd4: fix nfs4err_resource in 4.1 case"
Since we're still limiting attributes to a page, the result here is that
a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE
instead of NFS4ERR_RESOURCE.

Both error returns are wrong, and the real bug here is the arbitrary
limit on getattr results, fixed by as-yet out-of-tree patches.  But at a
minimum we can make life easier for clients by sticking to one broken
behavior in released kernels instead of two....

Trond says:

	one immediate consequence of this patch will be that NFSv4.1
	clients will now report EIO instead of EREMOTEIO if they hit the
	problem. That may make debugging a little less obvious.

	Another consequence will be that if we ever do try to add client
	side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal
	with the “handle existing buggy server” syndrome.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:46:45 +02:00
Jeff Layton
3758cf7e14 nfsd: set timeparms.to_maxval in setup_callback_client
...otherwise the logic in the timeout handling doesn't work correctly.

Spotted-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:34:31 +02:00
Jeff Layton
4991a628a7 locks: allow __break_lease to sleep even when break_time is 0
A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.

Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This causes __break_lease to spin in a tight loop and
causes soft lockups.

Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.

Cc: <stable@vger.kernel.org>
Cc: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Terry Barnaby <terry1@beam.ltd.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-04-18 14:34:30 +02:00
Geert Uytterhoeven
e686bd8dc5 cifs: Use min_t() when comparing "size_t" and "unsigned long"
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":

  fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
  fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast

Introduced by commit 7f25bba819 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-13 14:10:26 -07:00
Linus Torvalds
454fd351f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull yet more networking updates from David Miller:

 1) Various fixes to the new Redpine Signals wireless driver, from
    Fariya Fatima.

 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
    Dmitry Petukhov.

 3) UFO and TSO packets differ in whether they include the protocol
    header in gso_size, account for that in skb_gso_transport_seglen().
   From Florian Westphal.

 4) If VLAN untagging fails, we double free the SKB in the bridging
    output path.  From Toshiaki Makita.

 5) Several call sites of sk->sk_data_ready() were referencing an SKB
    just added to the socket receive queue in order to calculate the
    second argument via skb->len.  This is dangerous because the moment
    the skb is added to the receive queue it can be consumed in another
    context and freed up.

    It turns out also that none of the sk->sk_data_ready()
    implementations even care about this second argument.

    So just kill it off and thus fix all these use-after-free bugs as a
    side effect.

 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.

 7) pktgen needs to do locking properly for LLTX devices, from Daniel
    Borkmann.

 8) xen-netfront driver initializes TX array entries in RX loop :-) From
    Vincenzo Maffione.

 9) After refactoring, some tunnel drivers allow a tunnel to be
    configured on top itself.  Fix from Nicolas Dichtel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vti: don't allow to add the same tunnel twice
  gre: don't allow to add the same tunnel twice
  drivers: net: xen-netfront: fix array initialization bug
  pktgen: be friendly to LLTX devices
  r8152: check RTL8152_UNPLUG
  net: sun4i-emac: add promiscuous support
  net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  net: ipv6: Fix oif in TCP SYN+ACK route lookup.
  drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
  drivers: net: cpsw: discard all packets received when interface is down
  net: Fix use after free by removing length arg from sk_data_ready callbacks.
  Drivers: net: hyperv: Address UDP checksum issues
  Drivers: net: hyperv: Negotiate suitable ndis version for offload support
  Drivers: net: hyperv: Allocate memory for all possible per-pecket information
  bridge: Fix double free and memory leak around br_allowed_ingress
  bonding: Remove debug_fs files when module init fails
  i40evf: program RSS LUT correctly
  i40evf: remove open-coded skb_cow_head
  ixgb: remove open-coded skb_cow_head
  igbvf: remove open-coded skb_cow_head
  ...
2014-04-12 17:31:22 -07:00
Linus Torvalds
96c57ade7e ceph: fix pr_fmt() redefinition
The vfs merge caused a latent bug to show up:

   In file included from fs/ceph/super.h:4:0,
                    from fs/ceph/ioctl.c:3:
   include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
    #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    ^
   In file included from include/linux/kernel.h:13:0,
                    from include/linux/uio.h:12,
                    from include/linux/socket.h:7,
                    from include/uapi/linux/in.h:22,
                    from include/linux/in.h:23,
                    from fs/ceph/ioctl.c:1:
   include/linux/printk.h:214:0: note: this is the location of the previous definition
    #define pr_fmt(fmt) fmt
    ^

where the reason is that <linux/ceph_debug.h> is included much too late
for the "pr_fmt()" define.

The include of <linux/ceph_debug.h> needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of <linux/kernel.h>.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-12 15:39:53 -07:00
Linus Torvalds
5166701b36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe->buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
2014-04-12 14:49:50 -07:00
Linus Torvalds
0b747172dc Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris.

* git://git.infradead.org/users/eparis/audit: (28 commits)
  AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
  audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
  audit: do not cast audit_rule_data pointers pointlesly
  AUDIT: Allow login in non-init namespaces
  audit: define audit_is_compat in kernel internal header
  kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
  sched: declare pid_alive as inline
  audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
  syscall_get_arch: remove useless function arguments
  audit: remove stray newline from audit_log_execve_info() audit_panic() call
  audit: remove stray newlines from audit_log_lost messages
  audit: include subject in login records
  audit: remove superfluous new- prefix in AUDIT_LOGIN messages
  audit: allow user processes to log from another PID namespace
  audit: anchor all pid references in the initial pid namespace
  audit: convert PPIDs to the inital PID namespace.
  pid: get pid_t ppid of task in init_pid_ns
  audit: rename the misleading audit_get_context() to audit_take_context()
  audit: Add generic compat syscall support
  audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
  ...
2014-04-12 12:38:53 -07:00
Al Viro
19dfc1f5f2 cifs: fix the race in cifs_writev()
O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.

Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-12 06:52:48 -04:00
Al Viro
eab87235c0 ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
ceph_osdc_put_request(ERR_PTR(-error)) oopses.  What we want there
is break, not goto out.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-12 06:51:51 -04:00
Linus Torvalds
a63b747b41 Merge git://git.kvack.org/~bcrl/aio-next
Pull aio ctx->ring_pages migration serialization fix from Ben LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: v4 ensure access to ctx->ring_pages is correctly serialised for migration
2014-04-11 16:36:50 -07:00
Linus Torvalds
3123bca719 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull second set of btrfs updates from Chris Mason:
 "The most important changes here are from Josef, fixing a btrfs
  regression in 3.14 that can cause corruptions in the extent allocation
  tree when snapshots are in use.

  Josef also fixed some deadlocks in send/recv and other assorted races
  when balance is running"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (23 commits)
  Btrfs: fix compile warnings on on avr32 platform
  btrfs: allow mounting btrfs subvolumes with different ro/rw options
  btrfs: export global block reserve size as space_info
  btrfs: fix crash in remount(thread_pool=) case
  Btrfs: abort the transaction when we don't find our extent ref
  Btrfs: fix EINVAL checks in btrfs_clone
  Btrfs: fix unlock in __start_delalloc_inodes()
  Btrfs: scrub raid56 stripes in the right way
  Btrfs: don't compress for a small write
  Btrfs: more efficient io tree navigation on wait_extent_bit
  Btrfs: send, build path string only once in send_hole
  btrfs: filter invalid arg for btrfs resize
  Btrfs: send, fix data corruption due to incorrect hole detection
  Btrfs: kmalloc() doesn't return an ERR_PTR
  Btrfs: fix snapshot vs nocow writting
  btrfs: Change the expanding write sequence to fix snapshot related bug.
  btrfs: make device scan less noisy
  btrfs: fix lockdep warning with reclaim lock inversion
  Btrfs: hold the commit_root_sem when getting the commit root during send
  Btrfs: remove transaction from send
  ...
2014-04-11 14:16:53 -07:00
David S. Miller
676d23690f net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-11 16:15:36 -04:00
Wang Shilong
e4fbaee292 Btrfs: fix compile warnings on on avr32 platform
fs/btrfs/scrub.c: In function 'get_raid56_logic_offset':
fs/btrfs/scrub.c:2269: warning: comparison of distinct pointer types lacks a cast
fs/btrfs/scrub.c:2269: warning: right shift count >= width of type
fs/btrfs/scrub.c:2269: warning: passing argument 1 of '__div64_32' from incompatible pointer type

Since @rot is an int type, we should not use do_div(), fix it.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-11 06:35:50 -07:00
Linus Torvalds
9e897e13bd Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull exofs updates from Boaz Harrosh:
 "Trivial updates to exofs for 3.15-rc1

  Just a few fixes sent by people"

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  MAINTAINERS: Update email address for bhalevy
  fs: Mark functions as static in exofs/ore_raid.c
  fs: Mark function as static in exofs/super.c
2014-04-10 14:33:02 -07:00
Harald Hoyer
0723a0473f btrfs: allow mounting btrfs subvolumes with different ro/rw options
Given the following /etc/fstab entries:

/dev/sda3 /mnt/foo btrfs subvol=foo,ro 0 0
/dev/sda3 /mnt/bar btrfs subvol=bar,rw 0 0

you can't issue:

$ mount /mnt/foo
$ mount /mnt/bar

You would have to do:

$ mount /mnt/foo
$ mount -o remount,rw /mnt/foo
$ mount --bind -o remount,ro /mnt/foo
$ mount /mnt/bar

or

$ mount /mnt/bar
$ mount --rw /mnt/foo
$ mount --bind -o remount,ro /mnt/foo

With this patch you can do

$ mount /mnt/foo
$ mount /mnt/bar

$ cat /proc/self/mountinfo
49 33 0:41 /foo /mnt/foo ro,relatime shared:36 - btrfs /dev/sda3 rw,ssd,space_cache
87 33 0:41 /bar /mnt/bar rw,relatime shared:74 - btrfs /dev/sda3 rw,ssd,space_cache

Signed-off-by: Chris Mason <clm@fb.com>
2014-04-10 13:32:50 -07:00
Linus Torvalds
dd76a786af Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A small collection of fixes that should go in before -rc1.  The pull
  request contains:

   - A two patch fix for a regression with block enabled tagging caused
     by a commit in the initial pull request.  One patch is from Martin
     and ensures that SCSI doesn't truncate 64-bit block flags, the
     other one is from me and prevents us from double using struct
     request queuelist for both completion and busy tags.  This caused
     anything from a boot crash for some, to crashes under load.

   - A blk-mq fix for a potential soft stall when hot unplugging CPUs
     with busy IO.

   - percpu_counter fix is listed in here, that caused a suspend issue
     with virtio-blk due to percpu counters having an inconsistent state
     during CPU removal.  Andrew sent this in separately a few days ago,
     but it's here.  JFYI.

   - A few fixes for block integrity from Martin.

   - A ratelimit fix for loop from Mike Galbraith, to avoid spewing too
     much in error cases"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix regression with block enabled tagging
  scsi: Make sure cmd_flags are 64-bit
  block: Ensure we only enable integrity metadata for reads and writes
  block: Fix integrity verification
  block: Fix for_each_bvec()
  drivers/block/loop.c: ratelimit error messages
  blk-mq: fix potential stall during CPU unplug with IO pending
  percpu_counter: fix bad counter state during suspend
2014-04-10 09:26:55 -07:00
Martin K. Petersen
e69f18f06b block: Ensure we only enable integrity metadata for reads and writes
We'd occasionally attempt to generate protection information for flushes
and other requests with a zero payload. Make sure we only attempt to
enable integrity for reads and writes.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-09 08:00:06 -06:00
Martin K. Petersen
0bc6997306 block: Fix integrity verification
Commit bf36f9cfa6 caused a regression by effectively reverting Nic's
fix from 5837c80e87 that ensures we traverse the full bio_vec list
upon completion.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-09 08:00:04 -06:00
Linus Torvalds
75ff24fa52 Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:
   - server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
   - xdr fixes (a larger xdr rewrite has been posted but I decided it
     would be better to queue it up for 3.16).
   - miscellaneous fixes and cleanup from all over (thanks especially to
     Kinglong Mee)"

* 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
  nfsd4: don't create unnecessary mask acl
  nfsd: revert v2 half of "nfsd: don't return high mode bits"
  nfsd4: fix memory leak in nfsd4_encode_fattr()
  nfsd: check passed socket's net matches NFSd superblock's one
  SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
  NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
  SUNRPC: New helper for creating client with rpc_xprt
  NFSD: Free backchannel xprt in bc_destroy
  NFSD: Clear wcc data between compound ops
  nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
  nfsd4: fix nfs4err_resource in 4.1 case
  nfsd4: fix setclientid encode size
  nfsd4: remove redundant check from nfsd4_check_resp_size
  nfsd4: use more generous NFS4_ACL_MAX
  nfsd4: minor nfsd4_replay_cache_entry cleanup
  nfsd4: nfsd4_replay_cache_entry should be static
  nfsd4: update comments with obsolete function name
  rpc: Allow xdr_buf_subsegment to operate in-place
  NFSD: Using free_conn free connection
  SUNRPC: fix memory leak of peer addresses in XPRT
  ...
2014-04-08 18:28:14 -07:00
Dan Carpenter
ffddc5fd19 fs/ncpfs/dir.c: fix indenting in ncp_lookup()
My static checker suggests adding curly braces here.  Probably that was
the intent, but actually the code works the same either way.  I've just
changed the indenting and left the code as-is.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Acked-by: Dave Chiluk <chiluk@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:53 -07:00
Joe Perches
15a03ac6f8 ncpfs/inode.c: fix mismatch printk formats and arguments
Conversions to ncp_dbg showed some format/argument mismatches so fix
them.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:53 -07:00
Joe Perches
485b47f68c ncpfs: remove now unused PRINTK macro
Uses are gone, remove the macro.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Joe Perches
e45ca8baa3 ncpfs: convert PPRINTK to ncp_vdbg
Use a more current logging style.

Convert the paranoia debug statement to vdbg.
Remove the embedded function names as dynamic_debug can do that.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Joe Perches
d3b73ca1be ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
Use a more current logging style and enable use of dynamic debugging.

Remove embedded function names, dynamic debug can add this instead.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Joe Perches
b41f8b84d0 ncpfs: Add pr_fmt and convert printks to pr_<level>
Convert to a more current logging style.

Add pr_fmt to prefix with "ncpfs: ".
Remove the embedded function names and use "%s: ", __func__

Some previously unprefixed messages now have "ncpfs: "

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Sasha Levin
e53d77eb8b autofs4: check dev ioctl size before allocating
There wasn't any check of the size passed from userspace before trying
to allocate the memory required.

This meant that userspace might request more space than allowed,
triggering an OOM.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:51 -07:00
Linus Torvalds
e9f37d3a8d Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Highlights:

   - drm:

     Generic display port aux features, primary plane support, drm
     master management fixes, logging cleanups, enforced locking checks
     (instead of docs), documentation improvements, minor number
     handling cleanup, pseudofs for shared inodes.

   - ttm:

     add ability to allocate from both ends

   - i915:

     broadwell features, power domain and runtime pm, per-process
     address space infrastructure (not enabled)

   - msm:

     power management, hdmi audio support

   - nouveau:

     ongoing GPU fault recovery, initial maxwell support, random fixes

   - exynos:

     refactored driver to clean up a lot of abstraction, DP support
     moved into drm, LVDS bridge support added, parallel panel support

   - gma500:

     SGX MMU support, SGX irq handling, asle irq work fixes

   - radeon:

     video engine bringup, ring handling fixes, use dp aux helpers

   - vmwgfx:

     add rendernode support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
  DRM: armada: fix corruption while loading cursors
  drm/dp_helper: don't return EPROTO for defers (v2)
  drm/bridge: export ptn3460_init function
  drm/exynos: remove MODULE_DEVICE_TABLE definitions
  ARM: dts: exynos4412-trats2: enable exynos/fimd node
  ARM: dts: exynos4210-trats: enable exynos/fimd node
  ARM: dts: exynos4412-trats2: add panel node
  ARM: dts: exynos4210-trats: add panel node
  ARM: dts: exynos4: add MIPI DSI Master node
  drm/panel: add S6E8AA0 driver
  ARM: dts: exynos4210-universal_c210: add proper panel node
  drm/panel: add ld9040 driver
  panel/ld9040: add DT bindings
  panel/s6e8aa0: add DT bindings
  drm/exynos: add DSIM driver
  exynos/dsim: add DT bindings
  drm/exynos: disallow fbdev initialization if no device is connected
  drm/mipi_dsi: create dsi devices only for nodes with reg property
  drm/mipi_dsi: add flags to DSI messages
  Skip intel_crt_init for Dell XPS 8700
  ...
2014-04-08 09:52:16 -07:00
Linus Torvalds
a7963eb7f4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
 "various cleanups for ext2, ext3, udf, isofs, a documentation update
  for quota, and a fix of a race in reiserfs readdir implementation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: fix race in readdir
  ext2: acl: remove unneeded include of linux/capability.h
  ext3: explicitly remove inode from orphan list after failed direct io
  fs/isofs/inode.c add __init to init_inodecache()
  ext3: Speedup WB_SYNC_ALL pass
  fs/quota/Kconfig: Update filesystems
  ext3: Update outdated comment before ext3_ordered_writepage()
  ext3: Update PF_MEMALLOC handling in ext3_write_inode()
  ext2/3: use prandom_u32() instead of get_random_bytes()
  ext3: remove an unneeded check in ext3_new_blocks()
  ext3: remove unneeded check in ext3_ordered_writepage()
  fs: Mark function as static in ext3/xattr_security.c
  fs: Mark function as static in ext3/dir.c
  fs: Mark function as static in ext2/xattr_security.c
  ext3: Add __init macro to init_inodecache
  ext2: Add __init macro to init_inodecache
  udf: Add __init macro to init_inodecache
  fs: udf: parse_options: blocksize check
2014-04-07 17:59:17 -07:00
Linus Torvalds
26c12d9334 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - the rest of MM
 - zram updates
 - zswap updates
 - exit
 - procfs
 - exec
 - wait
 - crash dump
 - lib/idr
 - rapidio
 - adfs, affs, bfs, ufs
 - cris
 - Kconfig things
 - initramfs
 - small amount of IPC material
 - percpu enhancements
 - early ioremap support
 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...
2014-04-07 16:38:06 -07:00
Christian Engelmayer
fe4487d18f fs/ufs: remove unused ufs_super_block_third pointer
Pointer 'usb3' to struct ufs_super_block_third acquired via
ubh_get_usb_third() is never used in function
ufs_read_cylinder_structures().  Thus remove it.

Detected by Coverity: CID 139939.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Christian Engelmayer
48968a112c fs/ufs: remove unused ufs_super_block_second pointer
Pointer 'usb2' to struct ufs_super_block_second acquired via
ubh_get_usb_second() is never used in function ufs_statfs().  Thus
remove it.

Detected by Coverity: CID 139940.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Christian Engelmayer
6e0bd34c33 fs/ufs: remove unused ufs_super_block_first pointer
Remove occurences of unused pointers to struct ufs_super_block_first
that were acquired via ubh_get_usb_first().

Detected by Coverity: CID 139929 - CID 139936, CID 139940.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Fabian Frederick
76ee473578 fs/ufs/super.c: add __init to init_inodecache()
init_inodecache is only called by __init init_ufs_fs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Dave Jones
16caed3196 fault-injection: set bounds on what /proc/self/make-it-fail accepts.
/proc/self/make-it-fail is a boolean, but accepts any number, including
negative ones.  Change variable to unsigned, and cap upper bound at 1.

[akpm@linux-foundation.org: don't make make_it_fail unsigned]
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:10 -07:00
Fabian Frederick
758b444075 fs/bfs/inode.c: add __init to init_inodecache()
init_inodecache is only called by __init init_bfs_fs

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:08 -07:00
Fabian Frederick
8ca577223f affs: add mount option to avoid filename truncates
Normal behavior for filenames exceeding specific filesystem limits is to
refuse operation.

AFFS standard name length being only 30 characters against 255 for usual
Linux filesystems, original implementation does filename truncate by
default with a define value AFFS_NO_TRUNCATE which can be enabled but
needs module compilation.

This patch adds 'nofilenametruncate' mount option so that user can
easily activate that feature and avoid a lot of problems (eg overwrite
files ...)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:08 -07:00
Fabian Frederick
d40c4d46ea fs/affs/dir.c: unlock/brelse dir on failure + code clean-up
Commit 0edf977d2a ("[readdir] convert affs") returns directly -EIO
without unlocking dir inode and releasing dir bh when second affs_bread
sequence fails.  This patch restores initial behaviour.  It also fixes
pr_debug and affs_error to fit in 80 columns + removes reference to
filldir (replaced by dir_emit in the commit above).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:08 -07:00
Fabian Frederick
adbd319e5a affs: add __init to init_inodecache ()
init_inodecache is only called by __init init_affs_fs

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:08 -07:00
Fabian Frederick
894122db49 fs/adfs/super.c: add __init to init_inodecache()
init_inodecache is only called by __init init_adfs_fs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:08 -07:00
WANG Chao
c4082f36fa vmcore: continue vmcore initialization if PT_NOTE is found empty
Currently when an empty PT_NOTE is detected, vmcore initialization
fails.  It sounds too harsh.  Because PT_NOTE could be empty, for
example, one offlined a cpu but never restarted kdump service, and after
crash, PT_NOTE program header is there but no data contains.  It's
better to warn about the empty PT_NOTE and continue to initialise
vmcore.

And ultimately the multiple PT_NOTE are merged into a single one, all
empty PT_NOTE are discarded naturally during the merge.  So empty
PT_NOTE is not visible to user space and vmcore is as good as expected.

Signed-off-by: WANG Chao <chaowang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Greg Pearson <greg.pearson@hp.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:06 -07:00
Rashika Kheria
82e0703b6c include/linux/crash_dump.h: add vmcore_cleanup() prototype
Eliminate the following warning in proc/vmcore.c:

  fs/proc/vmcore.c:1088:6: warning: no previous prototype for `vmcore_cleanup' [-Wmissing-prototypes]

[akpm@linux-foundation.org: clean up powerpc, remove unneeded EXPORT_SYMBOL]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:06 -07:00
Oleg Nesterov
ad86622b47 wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space
get_task_state() uses the most significant bit to report the state to
user-space, this means that EXIT_ZOMBIE->EXIT_TRACE->EXIT_DEAD transition
can be noticed via /proc as Z -> X -> Z change.  Note that this was
possible even before EXIT_TRACE was introduced.

This is not really bad but imho it make sense to hide EXIT_TRACE from
user-space completely.  So the patch simply swaps EXIT_ZOMBIE and
EXIT_DEAD, this way EXIT_TRACE will be seen as EXIT_ZOMBIE by user-space.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:06 -07:00
Oleg Nesterov
23aebe1691 exec: kill bprm->tcomm[], simplify the "basename" logic
Starting from commit c4ad8f98be ("execve: use 'struct filename *' for
executable name passing") bprm->filename can not go away after
flush_old_exec(), so we do not need to save the binary name in
bprm->tcomm[] added by 96e02d1586 ("exec: fix use-after-free bug in
setup_new_exec()").

And there was never need for filename_to_taskname-like code, we can
simply do set_task_comm(kbasename(filename).

This patch has to change set_task_comm() and trace_task_rename() to
accept "const char *", but I think this change is also good.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:05 -07:00
Djalal Harouni
32ed74a4b9 procfs: make /proc/*/pagemap 0400
The /proc/*/pagemap contain sensitive information and currently its mode
is 0444.  Change this to 0400, so the VFS will prevent unprivileged
processes from getting file descriptors on arbitrary privileged
/proc/*/pagemap files.

This reduces the scope of address space leaking and bypasses by protecting
already running processes.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:05 -07:00
Djalal Harouni
35a35046e4 procfs: make /proc/*/{stack,syscall,personality} 0400
These procfs files contain sensitive information and currently their
mode is 0444.  Change this to 0400, so the VFS will be able to block
unprivileged processes from getting file descriptors on arbitrary
privileged /proc/*/{stack,syscall,personality} files.

This reduces the scope of ASLR leaking and bypasses by protecting already
running processes.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:04 -07:00
Monam Agarwal
1c44dbc82f fs/proc/inode.c: use RCU_INIT_POINTER(x, NULL)
Replace rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.  And in the
case of the NULL pointer, there is no structure to initialize.  So,
rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:04 -07:00
Andrey Vagin
49d063cb35 proc: show mnt_id in /proc/pid/fdinfo
Currently we don't have a way how to determing from which mount point
file has been opened.  This information is required for proper dumping
and restoring file descriptos due to presence of mount namespaces.  It's
possible, that two file descriptors are opened using the same paths, but
one fd references mount point from one namespace while the other fd --
from other namespace.

$ ls -l /proc/1/fd/1
lrwx------ 1 root root 64 Mar 19 23:54 /proc/1/fd/1 -> /dev/null

$ cat /proc/1/fdinfo/1
pos:	0
flags:	0100002
mnt_id:	16

$ cat /proc/1/mountinfo | grep ^16
16 32 0:4 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,size=1013356k,nr_inodes=253339,mode=755

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Rob Landley <rob@landley.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:04 -07:00
Luiz Capitulino
f0b5664ba7 fs/proc/meminfo: meminfo_proc_show(): fix typo in comment
It should read "reclaimable slab" and not "reclaimable swap".

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:04 -07:00
Davidlohr Bueso
615d6e8756 mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Kirill A. Shutemov
f1820361f8 mm: implement ->map_pages for page cache
filemap_map_pages() is generic implementation of ->map_pages() for
filesystems who uses page cache.

It should be safe to use filemap_map_pages() for ->map_pages() if
filesystem use filemap_fault() for ->fault().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Alex Thorlton
ab0e113f6b exec: kill the unnecessary mm->def_flags setting in load_elf_binary()
load_elf_binary() sets current->mm->def_flags = def_flags and def_flags
is always zero.  Not only this looks strange, this is unnecessary
because mm_init() has already set ->def_flags = 0.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:52 -07:00
Fabian Frederick
87c1b497c2 ntfs: logging clean-up
- Convert spinlock/static array to va_format (inspired by Joe Perches
  help on previous logging patches).

- Convert printk(KERN_ERR to pr_warn in __ntfs_warning.

- Convert printk(KERN_ERR to pr_err in __ntfs_error.

- Convert printk(KERN_DEBUG to pr_debug in __ntfs_debug.  (Note that
  __ntfs_debug is still guarded by #if DEBUG)

- Improve !DEBUG to parse all arguments (Joe Perches).

- Sparse pr_foo() conversions in super.c

NTFS, NTFS-fs prefixes as well as 'warning' and 'error' were removed :
pr_foo() automatically adds module name and error level is already
specified.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:49 -07:00
Linus Torvalds
240cd6a817 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "The biggest chunk is a series of patches from Ilya that add support
  for new Ceph osd and crush map features, including some new tunables,
  primary affinity, and the new encoding that is needed for erasure
  coding support.  This brings things into parity with the server side
  and the looming firefly release.  There is also support for allocation
  hints in RBD that help limit fragmentation on the server side.

  There is also a series of patches from Zheng fixing NFS reexport,
  directory fragmentation support, flock vs fnctl behavior, and some
  issues with clustered MDS.

  Finally, there are some miscellaneous fixes from Yunchuan Wen for
  fscache, Fabian Frederick for ACLs, and from me for fsync(dirfd)
  behavior"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (79 commits)
  ceph: skip invalid dentry during dcache readdir
  libceph: dump pool {read,write}_tier to debugfs
  libceph: output primary affinity values on osdmap updates
  ceph: flush cap release queue when trimming session caps
  ceph: don't grabs open file reference for aborted request
  ceph: drop extra open file reference in ceph_atomic_open()
  ceph: preallocate buffer for readdir reply
  libceph: enable PRIMARY_AFFINITY feature bit
  libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()
  libceph: add support for osd primary affinity
  libceph: add support for primary_temp mappings
  libceph: return primary from ceph_calc_pg_acting()
  libceph: switch ceph_calc_pg_acting() to new helpers
  libceph: introduce apply_temps() helper
  libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers
  libceph: ceph_can_shift_osds(pool) and pool type defines
  libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions
  libceph: enable OSDMAP_ENC feature bit
  libceph: primary_affinity decode bits
  libceph: primary_affinity infrastructure
  ...
2014-04-07 11:09:13 -07:00
Linus Torvalds
3021112598 f2fs updates for v3.15
This patch-set includes the following major enhancement patches.
  o introduce large directory support
  o introduce f2fs_issue_flush to merge redundant flush commands
  o merge write IOs as much as possible aligned to the segment
  o add sysfs entries to tune the f2fs configuration
  o use radix_tree for the free_nid_list to reduce in-memory operations
  o remove costly bit operations in f2fs_find_entry
  o enhance the readahead flow for CP/NAT/SIT/SSA blocks
 
 The other bug fixes are as follows.
  o recover xattr node blocks correctly after sudden-power-cut
  o fix to calculate the maximum number of node ids
  o enhance to handle many error cases
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQiQrAAoJEEAUqH6CSFDSlbIP/iq06BrUeMDLoQFhA2GQFKFD
 wd0A5h9hCiFcKBcI/u/aAQqj/a5wdwzDl9XzH2PzJ45IM6sVGQZ0lv+kdLhab6rk
 ipNbV7G0yLAX+8ygS6GZF7pSKfMzGSGTrRvfdtoiunIip1jCY1IkUxv1XMgBSPza
 wnWYrE5HXEqRUDCqPXJyxrPmx0/0jw8/V82Ng9stnY34ySs+l/3Pvg65Kh0QuSSy
 BRjJUGlOCF68KUBKd+6YB2T5KlbQde3/5lhP+GMOi+xm5sFB+j+59r/WpJpF2Nxs
 ImxQs5GkiU01ErH/rn5FgHY/zzddQenBKwOvrjEeUA1eVpBurdsIr1JN0P6qDbgB
 ho5U8LzCQq+HZiW444eQGkXSOagpUKqDhTVJO7Fji/wG88Atc9gLX3ix8TH2skxT
 C5CvvrJM7DKBtkZyTzotKY/cWorOZhge6E/EkbGaM1sSHdK5b1Rg4YlFi9TDyz0n
 QjGD1uuvEeukeKGdIG9pjc7o5ledbMDYwLpT2RuRXenLOTsn8BqDOo9aRTg+5Kag
 tJNJLFumjPR2mEBNKjicJMUf381J/SKDwZszAz9mgvCZXldMza/Ax0LzJDJCVmkP
 UuBiVzGxVzpd33IsESUDr0J9hc+t8kS10jfAeKnE3cpb6n7/RYxstHh6CHOFKNXM
 gPUSYPN3CYiP47DnSfzA
 =eSW+
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce large directory support
   - introduce f2fs_issue_flush to merge redundant flush commands
   - merge write IOs as much as possible aligned to the segment
   - add sysfs entries to tune the f2fs configuration
   - use radix_tree for the free_nid_list to reduce in-memory operations
   - remove costly bit operations in f2fs_find_entry
   - enhance the readahead flow for CP/NAT/SIT/SSA blocks

  The other bug fixes are as follows:
   - recover xattr node blocks correctly after sudden-power-cut
   - fix to calculate the maximum number of node ids
   - enhance to handle many error cases

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
  f2fs: fix wrong statistics of inline data
  f2fs: check the acl's validity before setting
  f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
  f2fs: fix to cover io->bio with io_rwsem
  f2fs: fix error path when fail to read inline data
  f2fs: use list_for_each_entry{_safe} for simplyfying code
  f2fs: avoid free slab cache under spinlock
  f2fs: avoid unneeded lookup when xattr name length is too long
  f2fs: avoid unnecessary bio submit when wait page writeback
  f2fs: return -EIO when node id is not matched
  f2fs: avoid RECLAIM_FS-ON-W warning
  f2fs: skip unnecessary node writes during fsync
  f2fs: introduce fi->i_sem to protect fi's info
  f2fs: change reclaim rate in percentage
  f2fs: add missing documentation for dir_level
  f2fs: remove unnecessary threshold
  f2fs: throttle the memory footprint with a sysfs entry
  f2fs: avoid to drop nat entries due to the negative nr_shrink
  f2fs: call f2fs_wait_on_page_writeback instead of native function
  f2fs: introduce nr_pages_to_write for segment alignment
  ...
2014-04-07 10:55:36 -07:00
David Sterba
36523e9512 btrfs: export global block reserve size as space_info
Introduce a block group type bit for a global reserve and fill the space
info for SPACE_INFO ioctl. This should replace the newly added ioctl
(01e219e806) to get just the 'size' part
of the global reserve, while the actual usage can be now visible in the
'btrfs fi df' output during ENOSPC stress.

The unpatched userspace tools will show the blockgroup as 'unknown'.

CC: Jeff Mahoney <jeffm@suse.com>
CC: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 10:41:53 -07:00
Sergei Trofimovich
800ee2247f btrfs: fix crash in remount(thread_pool=) case
Reproducer:
    mount /dev/ubda /mnt
    mount -oremount,thread_pool=42 /mnt

Gives a crash:
    ? btrfs_workqueue_set_max+0x0/0x70
    btrfs_resize_thread_pool+0xe3/0xf0
    ? sync_filesystem+0x0/0xc0
    ? btrfs_resize_thread_pool+0x0/0xf0
    btrfs_remount+0x1d2/0x570
    ? kern_path+0x0/0x80
    do_remount_sb+0xd9/0x1c0
    do_mount+0x26a/0xbf0
    ? kfree+0x0/0x1b0
    SyS_mount+0xc4/0x110

It's a call
    btrfs_workqueue_set_max(fs_info->scrub_wr_completion_workers, new_pool_size);
with
    fs_info->scrub_wr_completion_workers = NULL;

as scrub wqs get created only on user's demand.

Patch skips not-created-yet workqueues.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
CC: Qu Wenruo <quwenruo@cn.fujitsu.com>
CC: Chris Mason <clm@fb.com>
CC: Josef Bacik <jbacik@fb.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 10:41:52 -07:00
Linus Torvalds
c29aa153ef MTD updates for 3.15:
- A few SPI NOR ID definitions
  - Kill the NAND "max pagesize" restriction
  - Fix some x16 bus-width NAND support
  - Add NAND JEDEC parameter page support
  - DT bindings for NAND ECC
  - GPMI NAND updates (subpage reads)
  - More OMAP NAND refactoring
  - New STMicro SPI NOR driver (now in 40 patches!)
  - A few other random bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTP6x+AAoJEFySrpd9RFgtit0P/jLWsjMK8G2ldPC4bMZsXmDF
 n3c71GcCRlUq4Qzb4rtZx9DANLAh+JyRrMOKCPg6dAMegFdmUqDOpZpNp0vF57KG
 myFjqTk+n5y0tfSkWLMUFt0tQ8ArDp3IBkQCUWkD5LgG50EWmjveIQGH0kFnkE39
 Kytqvw17RV7f81tIs+WvKt8++YWD2X1VTpTi0S4fx2bJ99bJDBf/GgdoQpj2oirt
 igXmloUFEsob9JHZ3qumcUm9vaHwv2TiouZTvRyGdJCCoPdpJEZO4Ka6e4uAVarT
 6kMKXBk3lj2GsilOSFFCNetXfy5Bf0TkJkv4rDjh3R1Y4J/hSgraVCbWXdKhb6tj
 RmwesdFMjsyS4f/Rhk5PXwJgGL9uK2mi6bk/SmXU0AMgCDSa5zjshY8Wq6C6uXwk
 LqlnK8l3h8Txotbc/XJIL+QGMbMkYQI8gxWTHFaqzDtkMe36mnGs9Zec3oso/s2d
 CNRpq5+dMZ6qF0z3zpOQHmFbaOekivMy7kCKMXer6ONsrQBNwTdmkwy+SdqnWsLF
 YdJttwV/RRcE0SRvK6GrhvzkGlV83Z8RPny6hC1kbrgQ0ffoy2CmIqyWNObK1RXf
 sYqoF8TCtR6Y8rHHi5dzZ1lria+nm8pb4+UfQLRI0mK8og7YW+fIfHXxqRrZZIrt
 No8NCPBKWzyew2UE0AiQ
 =CKih
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 - A few SPI NOR ID definitions
 - Kill the NAND "max pagesize" restriction
 - Fix some x16 bus-width NAND support
 - Add NAND JEDEC parameter page support
 - DT bindings for NAND ECC
 - GPMI NAND updates (subpage reads)
 - More OMAP NAND refactoring
 - New STMicro SPI NOR driver (now in 40 patches!)
 - A few other random bugfixes

* tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd: (120 commits)
  Fix index regression in nand_read_subpage
  mtd: diskonchip: mem resource name is not optional
  mtd: nand: fix mention to CONFIG_MTD_NAND_ECC_BCH
  mtd: nand: fix GET/SET_FEATURES address on 16-bit devices
  mtd: omap2: Use devm_ioremap_resource()
  mtd: denali_dt: Use devm_ioremap_resource()
  mtd: devices: elm: update DRIVER_NAME as "omap-elm"
  mtd: devices: elm: configure parallel channels based on ecc_steps
  mtd: devices: elm: clean elm_load_syndrome
  mtd: devices: elm: check for hardware engine's design constraints
  mtd: st_spi_fsm: Succinctly reorganise .remove()
  mtd: st_spi_fsm: Allow loop to run at least once before giving up CPU
  mtd: st_spi_fsm: Correct vendor name spelling issue - missing "M"
  mtd: st_spi_fsm: Avoid duplicating MTD core code
  mtd: st_spi_fsm: Remove useless consts from function arguments
  mtd: st_spi_fsm: Convert ST SPI FSM (NOR) Flash driver to new DT partitions
  mtd: st_spi_fsm: Move runtime configurable msg sequences into device's struct
  mtd: st_spi_fsm: Supply the W25Qxxx chip specific configuration call-back
  mtd: st_spi_fsm: Supply the S25FLxxx chip specific configuration call-back
  mtd: st_spi_fsm: Supply the MX25xxx chip specific configuration call-back
  ...
2014-04-07 10:17:30 -07:00
Josef Bacik
c4a050bbbb Btrfs: abort the transaction when we don't find our extent ref
I'm not sure why we weren't aborting here in the first place, it is obviously a
bad time from the fact that we print the leaf and yell loudly about it.  Fix
this up, otherwise we panic because our path could be pointing into oblivion.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:51 -07:00
Chris Mason
3a29bc0928 Btrfs: fix EINVAL checks in btrfs_clone
btrfs_drop_extents can now return -EINVAL, but only one caller
in btrfs_clone was checking for it.  This adds it to the
caller for inline extents, which is where we really need it.

Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:50 -07:00
Wang Shilong
a1ecaabbf9 Btrfs: fix unlock in __start_delalloc_inodes()
This patch fix a regression caused by the following patch:
Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock

break while loop will make us call @spin_unlock() without
calling @spin_lock() before, fix it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:50 -07:00
Wang Shilong
3b080b2564 Btrfs: scrub raid56 stripes in the right way
Steps to reproduce:
 # mkfs.btrfs -f /dev/sda[8-11] -m raid5 -d raid5
 # mount /dev/sda8 /mnt
 # btrfs scrub start -BR /mnt
 # echo $? <--unverified errors make return value be 3

This is because we don't setup right mapping between physical
and logical address for raid56, which makes checksum mismatch.
But we will find everthing is fine later when rechecking using
btrfs_map_block().

This patch fixed the problem by settuping right mappings and
we only verify data stripes' checksums.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:49 -07:00
Wang Shilong
68bb462d42 Btrfs: don't compress for a small write
To compress a small file range(<=blocksize) that is not
an inline extent can not save disk space at all. skip it can
save us some cpu time.

This patch can also fix wrong setting nocompression flag for
inode, say a case when @total_in is 4096, and then we get
@total_compressed 52,because we do aligment to page cache size
firstly, and then we get into conclusion @total_in=@total_compressed
thus we will clear this inode's compression flag.

An exception comes from inserting inline extent failure but we
still have @total_compressed < @total_in,so we will still reset
inode's flag, this is ok, because we don't have good compression
effect.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:48 -07:00
Filipe Manana
c50d3e71c3 Btrfs: more efficient io tree navigation on wait_extent_bit
If we don't reschedule use rb_next to find the next extent state
instead of a full tree search, which is more efficient and safe
since we didn't release the io tree's lock.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:47 -07:00
Filipe Manana
c715e155c9 Btrfs: send, build path string only once in send_hole
There's no point building the path string in each iteration of the
send_hole loop, as it produces always the same string.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:46 -07:00
Gui Hecheng
9a40f1222a btrfs: filter invalid arg for btrfs resize
Originally following cmds will work:
	# btrfs fi resize -10A  <mnt>
	# btrfs fi resize -10Gaha <mnt>
Filter the arg by checking the return pointer of memparse.

Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:45 -07:00
Filipe Manana
766b5e5ae7 Btrfs: send, fix data corruption due to incorrect hole detection
During an incremental send, when we finish processing an inode (corresponding to
a regular file) we would assume the gap between the end of the last processed file
extent and the file's size corresponded to a file hole, and therefore incorrectly
send a bunch of zero bytes to overwrite that region in the file.

This affects only kernel 3.14.

Reproducer:

    mkfs.btrfs -f /dev/sdc
    mount /dev/sdc /mnt

    xfs_io -f -c "falloc -k 0 268435456" /mnt/foo

    btrfs subvolume snapshot -r /mnt /mnt/mysnap0

    xfs_io -c "pwrite -S 0x01 -b 9216 16190218 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x02 -b 1121 198720104 1121" /mnt/foo
    xfs_io -c "pwrite -S 0x05 -b 9216 107887439 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x06 -b 9216 225520207 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x07 -b 67584 102138300 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x08 -b 7000 94897484 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x09 -b 113664 245083212 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x10 -b 123 17937788 123" /mnt/foo
    xfs_io -c "pwrite -S 0x11 -b 39936 229573311 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x12 -b 67584 174792222 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x13 -b 9216 249253213 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x16 -b 67584 150046083 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x17 -b 39936 118246040 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x18 -b 67584 215965442 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x19 -b 33792 97096725 33792" /mnt/foo
    xfs_io -c "pwrite -S 0x20 -b 125952 166300596 125952" /mnt/foo
    xfs_io -c "pwrite -S 0x21 -b 123 1078957 123" /mnt/foo
    xfs_io -c "pwrite -S 0x25 -b 9216 212044492 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x26 -b 7000 265037146 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x27 -b 42757 215922685 42757" /mnt/foo
    xfs_io -c "pwrite -S 0x28 -b 7000 69865411 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x29 -b 67584 67948958 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x30 -b 39936 266967019 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x31 -b 1121 19582453 1121" /mnt/foo
    xfs_io -c "pwrite -S 0x32 -b 17408 257710255 17408" /mnt/foo
    xfs_io -c "pwrite -S 0x33 -b 39936 3895518 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x34 -b 125952 12045847 125952" /mnt/foo
    xfs_io -c "pwrite -S 0x35 -b 17408 19156379 17408" /mnt/foo
    xfs_io -c "pwrite -S 0x36 -b 39936 50160066 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x37 -b 113664 9549793 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x38 -b 105472 94391506 105472" /mnt/foo
    xfs_io -c "pwrite -S 0x39 -b 23552 143632863 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x40 -b 39936 241283845 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x41 -b 113664 199937606 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x42 -b 67584 67380093 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x43 -b 67584 26793129 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x44 -b 39936 14421913 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x45 -b 123 253097405 123" /mnt/foo
    xfs_io -c "pwrite -S 0x46 -b 1121 128233424 1121" /mnt/foo
    xfs_io -c "pwrite -S 0x47 -b 105472 91577959 105472" /mnt/foo
    xfs_io -c "pwrite -S 0x48 -b 1121 7245381 1121" /mnt/foo
    xfs_io -c "pwrite -S 0x49 -b 113664 182414694 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x50 -b 9216 32750608 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x51 -b 67584 266546049 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x52 -b 67584 87969398 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x53 -b 9216 260848797 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x54 -b 39936 119461243 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x55 -b 7000 200178693 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x56 -b 9216 243316029 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x57 -b 7000 209658229 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x58 -b 101376 179745192 101376" /mnt/foo
    xfs_io -c "pwrite -S 0x59 -b 9216 64012300 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x60 -b 125952 181705139 125952" /mnt/foo
    xfs_io -c "pwrite -S 0x61 -b 23552 235737348 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x62 -b 113664 106021355 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x63 -b 67584 135753552 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x64 -b 23552 95730888 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x65 -b 11 17311415 11" /mnt/foo
    xfs_io -c "pwrite -S 0x66 -b 33792 120695553 33792" /mnt/foo
    xfs_io -c "pwrite -S 0x67 -b 9216 17164631 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x68 -b 9216 136065853 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x69 -b 67584 37752198 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x70 -b 101376 189717473 101376" /mnt/foo
    xfs_io -c "pwrite -S 0x71 -b 7000 227463698 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x72 -b 9216 12655137 9216" /mnt/foo
    xfs_io -c "pwrite -S 0x73 -b 7000 7488866 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x74 -b 113664 87813649 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x75 -b 33792 25802183 33792" /mnt/foo
    xfs_io -c "pwrite -S 0x76 -b 39936 93524024 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x77 -b 33792 113336388 33792" /mnt/foo
    xfs_io -c "pwrite -S 0x78 -b 105472 184955320 105472" /mnt/foo
    xfs_io -c "pwrite -S 0x79 -b 101376 225691598 101376" /mnt/foo
    xfs_io -c "pwrite -S 0x80 -b 23552 77023155 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x81 -b 11 201888192 11" /mnt/foo
    xfs_io -c "pwrite -S 0x82 -b 11 115332492 11" /mnt/foo
    xfs_io -c "pwrite -S 0x83 -b 67584 230278015 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x84 -b 11 120589073 11" /mnt/foo
    xfs_io -c "pwrite -S 0x85 -b 125952 202207819 125952" /mnt/foo
    xfs_io -c "pwrite -S 0x86 -b 113664 86672080 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x87 -b 17408 208459603 17408" /mnt/foo
    xfs_io -c "pwrite -S 0x88 -b 7000 73372211 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x89 -b 7000 42252122 7000" /mnt/foo
    xfs_io -c "pwrite -S 0x90 -b 23552 46784881 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x91 -b 101376 63172351 101376" /mnt/foo
    xfs_io -c "pwrite -S 0x92 -b 23552 59341931 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x93 -b 39936 239599283 39936" /mnt/foo
    xfs_io -c "pwrite -S 0x94 -b 67584 175643105 67584" /mnt/foo
    xfs_io -c "pwrite -S 0x97 -b 23552 105534880 23552" /mnt/foo
    xfs_io -c "pwrite -S 0x98 -b 113664 8236844 113664" /mnt/foo
    xfs_io -c "pwrite -S 0x99 -b 125952 144489686 125952" /mnt/foo
    xfs_io -c "pwrite -S 0xa0 -b 7000 73273112 7000" /mnt/foo
    xfs_io -c "pwrite -S 0xa1 -b 125952 194580243 125952" /mnt/foo
    xfs_io -c "pwrite -S 0xa2 -b 123 56296779 123" /mnt/foo
    xfs_io -c "pwrite -S 0xa3 -b 11 233066845 11" /mnt/foo
    xfs_io -c "pwrite -S 0xa4 -b 39936 197727090 39936" /mnt/foo
    xfs_io -c "pwrite -S 0xa5 -b 101376 53579812 101376" /mnt/foo
    xfs_io -c "pwrite -S 0xa6 -b 9216 85669738 9216" /mnt/foo
    xfs_io -c "pwrite -S 0xa7 -b 125952 21266322 125952" /mnt/foo
    xfs_io -c "pwrite -S 0xa8 -b 23552 125726568 23552" /mnt/foo
    xfs_io -c "pwrite -S 0xa9 -b 9216 18423680 9216" /mnt/foo
    xfs_io -c "pwrite -S 0xb0 -b 1121 165901483 1121" /mnt/foo

    btrfs subvolume snapshot -r /mnt /mnt/mysnap1

    xfs_io -c "pwrite -S 0xff -b 10 16190218 10" /mnt/foo

    btrfs subvolume snapshot -r /mnt /mnt/mysnap2

    md5sum /mnt/foo          # returns 79e53f1466bfc09fd82b450689e6119e
    md5sum /mnt/mysnap2/foo  # returns 79e53f1466bfc09fd82b450689e6119e too

    btrfs send /mnt/mysnap1 -f /tmp/1.snap
    btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap

    mkfs.btrfs -f /dev/sdc
    mount /dev/sdc /mnt

    btrfs receive /mnt -f /tmp/1.snap
    btrfs receive /mnt -f /tmp/2.snap

    md5sum /mnt/mysnap2/foo  # returns 2bb414c5155767cedccd7063e51beabd !!

A testcase for xfstests follows soon too.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:45 -07:00
Dan Carpenter
84dbeb87d1 Btrfs: kmalloc() doesn't return an ERR_PTR
The error handling was copy and pasted from memdup_user().  It should be
checking for NULL obviously.

Fixes: abccd00f8a ('btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:44 -07:00
Wang Shilong
e9894fd3e3 Btrfs: fix snapshot vs nocow writting
While running fsstress and snapshots concurrently, we will hit something
like followings:

Thread 1			Thread 2

|->fallocate
  |->write pages
    |->join transaction
       |->add ordered extent
    |->end transaction
				|->flushing data
				  |->creating pending snapshots
|->write data into src root's
   fallocated space

After above work flows finished, we will get a state that source and
snapshot root share same space, but source root have written data into
fallocated space, this will make fsck fail to verify checksums for
snapshot root's preallocating file extent data.Nocow writting also
has this same problem.

Fix this problem by syncing snapshots with nocow writting:

 1.for nocow writting,if there are pending snapshots, we will
 fall into COW way.

 2.if there are pending nocow writes, snapshots for this root
 will be blocked until nocow writting finish.

Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:43 -07:00
Qu Wenruo
3ac0d7b96a btrfs: Change the expanding write sequence to fix snapshot related bug.
When testing fsstress with snapshot making background, some snapshot
following problem.

Snapshot 270:
inode 323: size 0

Snapshot 271:
inode 323: size 349145
|-------Hole---|---------Empty gap-------|-------Hole-----|
0	    122880			172032	      349145

Snapshot 272:
inode 323: size 349145
|-------Hole---|------------Data---------|-------Hole-----|
0	    122880			172032	      349145

The fsstress operation on inode 323 is the following:
write: 		offset 	126832 	len 43124
truncate: 	size 	349145

Since the write with offset is consist of 2 operations:
1. punch hole
2. write data
Hole punching is faster than data write, so hole punching in write
and truncate is done first and then buffered write, so the snapshot 271 got
empty gap, which will not pass btrfsck.

To fix the bug, this patch will change the write sequence which will
first punch a hole covering the write end if a hole is needed.

Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:42 -07:00
David Sterba
60999ca4b4 btrfs: make device scan less noisy
Print the message only when the device is seen for the first time.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:41 -07:00
Jeff Mahoney
ed55b6ac07 btrfs: fix lockdep warning with reclaim lock inversion
When encountering memory pressure, testers have run into the following
lockdep warning. It was caused by __link_block_group calling kobject_add
with the groups_sem held. kobject_add calls kvasprintf with GFP_KERNEL,
which gets us into reclaim context. The kobject doesn't actually need
to be added under the lock -- it just needs to ensure that it's only
added for the first block group to be linked.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.14.0-rc8-default #1 Not tainted
---------------------------------------------------------
kswapd0/169 just changed the state of lock:
 (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa018baea>] __btrfs_release_delayed_node+0x3a/0x200 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 (&found->groups_sem){+++++.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&found->groups_sem);
                               local_irq_disable();
                               lock(&delayed_node->mutex);
                               lock(&found->groups_sem);
  <Interrupt>
    lock(&delayed_node->mutex);

 *** DEADLOCK ***
2 locks held by kswapd0/169:
 #0:  (shrinker_rwsem){++++..}, at: [<ffffffff81159e8a>] shrink_slab+0x3a/0x160
 #1:  (&type->s_umount_key#27){++++..}, at: [<ffffffff811bac6f>] grab_super_passive+0x3f/0x90

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:40 -07:00
Josef Bacik
3f8a18cc53 Btrfs: hold the commit_root_sem when getting the commit root during send
We currently rely too heavily on roots being read-only to save us from just
accessing root->commit_root.  We can easily balance blocks out from underneath a
read only root, so to save us from getting screwed make sure we only access
root->commit_root under the commit root sem.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-07 09:08:39 -07:00
Chao Yu
48b230a583 f2fs: fix wrong statistics of inline data
If we remove a file that has inline data after mount, our statistics turns to
inaccurate.

cat /sys/kernel/debug/f2fs/status
  - Inline_data Inode: 4294967295

Let's add stat_inc_inline_inode() to stat inline info of the file when lookup.

Change log from v1:
 o stat in f2fs_lookup() instead of in do_read_inode() for excluding wrong stat.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 12:40:58 +09:00
ZhangZhen
3a8861e271 f2fs: check the acl's validity before setting
Before setting the acl, call posix_acl_valid() to check if it is
valid or not.

Signed-off-by: zhangzhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 12:18:30 +09:00
Jaegeuk Kim
6b4afdd794 f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
Some storage devices show relatively high latencies to complete cache_flush
commands, even though their normal IO speed is prettry much high. In such
the case, it needs to merge cache_flush commands as much as possible to avoid
issuing them redundantly.
So, this patch introduces a mount option, "-o flush_merge", to mitigate such
the overhead.

If this option is enabled by user, F2FS merges the cache_flush commands and then
issues just one cache_flush on behalf of them. Once the single command is
finished, F2FS sends a completion signal to all the pending threads.

Note that, this option can be used under a workload consisting of very intensive
concurrent fsync calls, while the storage handles cache_flush commands slowly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 09:50:58 +09:00
Josef Bacik
9e351cc862 Btrfs: remove transaction from send
Lets try this again.  We can deadlock the box if we send on a box and try to
write onto the same fs with the app that is trying to listen to the send pipe.
This is because the writer could get stuck waiting for a transaction commit
which is being blocked by the send.  So fix this by making sure looking at the
commit roots is always going to be consistent.  We do this by keeping track of
which roots need to have their commit roots swapped during commit, and then
taking the commit_root_sem and swapping them all at once.  Then make sure we
take a read lock on the commit_root_sem in cases where we search the commit root
to make sure we're always looking at a consistent view of the commit roots.
Previously we had problems with this because we would swap a fs tree commit root
and then swap the extent tree commit root independently which would cause the
backref walking code to screw up sometimes.  With this patch we no longer
deadlock and pass all the weird send/receive corner cases.  Thanks,

Reportedy-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-06 17:39:30 -07:00
Josef Bacik
a26e8c9f75 Btrfs: don't clear uptodate if the eb is under IO
So I have an awful exercise script that will run snapshot, balance and
send/receive in parallel.  This sometimes would crash spectacularly and when it
came back up the fs would be completely hosed.  Turns out this is because of a
bad interaction of balance and send/receive.  Send will hold onto its entire
path for the whole send, but its blocks could get relocated out from underneath
it, and because it doesn't old tree locks theres nothing to keep this from
happening.  So it will go to read in a slot with an old transid, and we could
have re-allocated this block for something else and it could have a completely
different transid.  But because we think it is invalid we clear uptodate and
re-read in the block.  If we do this before we actually write out the new block
we could write back stale data to the fs, and boom we're screwed.

Now we definitely need to fix this disconnect between send and balance, but we
really really need to not allow ourselves to accidently read in stale data over
new data.  So make sure we check if the extent buffer is not under io before
clearing uptodate, this will kick back EIO to the caller instead of reading in
stale data and keep us from corrupting the fs.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-04-06 17:34:37 -07:00