Commit graph

456762 commits

Author SHA1 Message Date
J. Bruce Fields
95ad5c2913 dcache: d_splice_alias should detect loops
I believe this can only happen in the case of a corrupted filesystem.
So -EIO looks like the appropriate error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:11 -04:00
J. Bruce Fields
9635389534 exportfs: update Exporting documentation
Minor documentation updates:
	- refer to d_obtain_alias rather than d_alloc_anon
	- explain when to use d_splice_alias and when
	  d_materialise_unique.
	- cut some details of d_splice_alias/d_materialise_unique
	  implementation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
8d80d7dabe dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
If we get to this point and discover the dentry is not a root dentry, or
not DCACHE_DISCONNECTED--great, we always prefer that anyway.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
52ed46f0fa dcache: remove unused d_find_alias parameter
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
1a0a397e41 dcache: d_obtain_alias callers don't all want DISCONNECTED
There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.

This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.

It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.

In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a12 "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.

Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
da093a9b76 dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
Any IS_ROOT() alias should be safe to use; there's nothing special about
DCACHE_DISCONNECTED dentries.

Note that this is in fact useful for filesystems such as btrfs which can
legimately encounter a directory with a preexisting IS_ROOT alias on a
lookup that crosses into a subvolume.  (Those aliases are currently
marked DCACHE_DISCONNECTED--but not really for any good reason, and
we'll change that soon.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
908790fa3b dcache: d_splice_alias mustn't create directory aliases
Currently if d_splice_alias finds a directory with an alias that is not
IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory.

Duplicate directory dentries are unacceptable; it is better just to
error out.

(In the case of a local filesystem the most likely case is filesystem
corruption: for example, perhaps two directories point to the same child
directory, and the other parent has already been found and cached.)

Note that distributed filesystems may encounter this case in normal
operation if a remote host moves a directory to a location different
from the one we last cached in the dcache.  For that reason, such
filesystems should instead use d_materialise_unique, which tries to move
the old directory alias to the right place instead of erroring out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
75a2352d01 dcache: close d_move race in d_splice_alias
d_splice_alias will d_move an IS_ROOT() directory dentry into place if
one exists.  This should be safe as long as the dentry remains IS_ROOT,
but I can't see what guarantees that: once we drop the i_lock all we
hold here is the i_mutex on an unrelated parent directory.

Instead copy the logic of d_materialise_unique.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
3f70bd51cb dcache: move d_splice_alias
Just a trivial move to locate it near (similar) d_materialise_unique
code and save some forward references in a following patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
J. Bruce Fields
d03b29a271 namei: trivial fix to vfs_rename_dir comment
Looks like the directory loop check is actually done in renameat?
Whatever, leave this out rather than trying to keep it up to date with
the code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
NeilBrown
b8faf035ea VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
In REF-walk mode, ->d_manage can return -EISDIR to indicate
that the dentry is not really a mount trap (or even a mount point)
and that any mounts or any DCACHE_NEED_AUTOMOUNT flag should be
ignored.

RCU-walk mode doesn't currently support this, so if there is a dentry
with DCACHE_NEED_AUTOMOUNT set but which shouldn't be a mount-trap,
lookup_fast() will always drop in REF-walk mode.

With this patch, an -EISDIR from ->d_manage will always cause mounts
and automounts to be ignored, both in REF-walk and RCU-walk.

Bug-fixed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
Miklos Szeredi
7c33d5972c cifs: support RENAME_NOREPLACE
This flag gives CIFS the ability to support its native rename semantics.

Implementation is simple: just bail out before trying to hack around the
noreplace semantics.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Steve French <smfrench@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Miklos Szeredi
9a423bb6e3 hostfs: support rename flags
Support RENAME_NOREPLACE and RENAME_EXCHANGE flags on hostfs if the
underlying filesystem supports it.

Since renameat2(2) is not yet in any libc, use syscall(2) to invoke the
renameat2 syscall.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Miklos Szeredi
37456771c5 shmem: support RENAME_EXCHANGE
This is really simple in tmpfs since the VFS already takes care of
shuffling the dentries.  Just adjust nlink on parent directories and touch
c & mtimes.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Miklos Szeredi
3b69ff51d0 shmem: support RENAME_NOREPLACE
Implement ->rename2 instead of ->rename.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Miklos Szeredi
80ace85c91 btrfs: add RENAME_NOREPLACE
RENAME_NOREPLACE is trivial to implement for most filesystems: switch over
to ->rename2() and check for the supported flags.  The rest is done by the
VFS.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Miklos Szeredi
a0dbc56610 bad_inode: add ->rename2()
so we return -EIO instead of -EINVAL.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Miklos Szeredi
7177a9c4b5 fs: call rename2 if exists
Christoph Hellwig suggests:

1) make vfs_rename call ->rename2 if it exists instead of ->rename
2) switch all filesystems that you're adding NOREPLACE support for to
   use ->rename2
3) see how many ->rename instances we'll have left after a few
   iterations of 2.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Ionut Alexa
2577d92ebd kernel/acct.c: fix coding style warnings and errors
Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Al Viro
3064c3563b death to mnt_pinned
Rather than playing silly buggers with vfsmount refcounts, just have
acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt
and replace it with said clone.  Then attach the pin to original
vfsmount.  Voila - the clone will be alive until the file gets closed,
making sure that underlying superblock remains active, etc., and
we can drop the original vfsmount, so that it's not kept busy.
If the file lives until the final mntput of the original vfsmount,
we'll notice that there's an fs_pin (one in bsd_acct_struct that
holds that file) and mnt_pin_kill() will take it out.  Since
->kill() is synchronous, we won't proceed past that point until
these files are closed (and private clones of our vfsmount are
gone), so we get the same ordering warranties we used to get.

mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance -
it never became usable outside of kernel/acct.c (and racy wrt
umount even there).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Al Viro
8fa1f1c2bd make fs/{namespace,super}.c forget about acct.h
These externs belong in fs/internal.h.  Rename (they are not acct-specific
anymore) and move them over there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:09 -04:00
Al Viro
efb170c228 take fs_pin stuff to fs/*
Add a new field to fs_pin - kill(pin).  That's what umount and r/o remount
will be calling for all pins attached to vfsmount and superblock resp.
Called after bumping the refcount, so it won't go away under us.  Dropping
the refcount is responsibility of the instance.  All generic stuff moved to
fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from
fs/super.c and fs/namespace.c.  After that - death to mnt_pin(); it was
intended to be usable as generic mechanism for code that wants to attach
objects to vfsmount, so that they would not make the sucker busy and
would get killed on umount.  Never got it right; it remained acct.c-specific
all along.  Now it's very close to being killable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
1629d0eb3e start carving bsd_acct_struct up
pull generic parts into struct fs_pin.  Eventually we want those
to replace mnt_pin()/mnt_unpin() mess; that stuff will move to
fs/*.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
215748e67d acct: move mnt_pin() upwards.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
17c0a5aaff make acct_kill() wait for file closing.
Do actual closing of file via schedule_work().  And use
__fput_sync() there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
0aec09d049 drop ->s_umount around acct_auto_close()
just repeat the frozen check after regaining it, and check that sb
is still alive.  If several threads hit acct_auto_close() at the
same time, acct_auto_close() will survive that just fine.  And we
really don't want to play with writes and closing the file with
->s_umount held exclusive - it's a deadlock country.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
2798d4ce61 acct: get rid of acct_lock for acct->count
* make acct->count atomic and acct freeing - rcu-delayed.
* instead of grabbing acct_lock around the places where we take a reference,
do that under rcu_read_lock() with atomic_long_inc_not_zero().
* have the new acct locked before making ns->bacct point to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
215752fce3 acct: get rid of acct_list
Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
54a4d58a64 acct: simplify check_free_space()
a) file can't be NULL
b) file can't be changed under us
c) all writes are serialized by acct->lock; no need to mess with
spinlock there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
b8f00e6be4 acct: new lifetime rules
Do not reuse bsd_acct_struct after closing the damn thing.
Structure lifetime is controlled by refcount now.  We also
have a mutex in there, held over closing and writing (the
file is O_APPEND, so we are not losing any concurrency).

As the result, we do not need to bother with get_file()/fput()
on log write anymore.  Moreover, do_acct_process() only needs
acct itself; file and pidns are picked from it.

Killed instances are distinguished by having NULL ->ns.
Refcount is protected by acct_lock; anybody taking the
mutex needs to grab a reference first.

The things will get a lot simpler in the next commits - this
is just the minimal chunk switching to the new lifetime rules.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
9df7fa16ee acct: serialize acct_on()
brute-force - on a global mutex that isn't nested into anything.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
795a2f22a8 acct() should honour the limits from the very beginning
We need to check free space on the first write to freshly opened log.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:07 -04:00
Al Viro
e25ff11ff1 split the slow path in acct_process() off
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:07 -04:00
Al Viro
cdd37e2309 separate namespace-independent parts of filling acct_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:07 -04:00
Al Viro
ed44724b79 acct: switch to __kernel_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:07 -04:00
Al Viro
ecfdb33d1f acct: encode_comp_t(0) is 0, fortunately...
There was an amusing bogosity in ac_rw calculation - it tried to
do encode_comp_t(encode_comp_t(0) / 1024).  Seeing that comp_t is
a 3-bit exponent + 13-bit mantissa... it's a good thing that 0 is
represented by all-bits-clear.

The history of that one is interesting - it was introduced in
2.1.68pre1, when acct.c had been reworked and moved to separate
file.  Two months later (2.1.86) somebody has noticed that the
sucker won't compile - there was no task_struct::io_usage.
At which point the ac_io calculation had changed from
encode_comp_t(current->io_usage) to encode_comp_t(0) and the
bug in the next line (absolutely real back then, had it ever
managed to compile) become a harmless bogosity.  Looks like
nobody has ever noticed until now.

Anyway, let's bury that idiocy now that it got noticed.  17 years
is long enough...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:07 -04:00
Al Viro
82df9c8beb Merge commit 'ccbf62d8a284cf181ac28c8e8407dd077d90dd4b' into for-next
backmerge to avoid kernel/acct.c conflict
2014-08-07 14:07:57 -04:00
Linus Torvalds
19583ca584 Linux 3.16 2014-08-03 15:25:02 -07:00
Linus Torvalds
8d71844b51 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two fixes in the timer area:
   - a long-standing lock inversion due to a printk
   - suspend-related hrtimer corruption in sched_clock"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  sched_clock: Avoid corrupting hrtimer tree during suspend
2014-08-03 09:58:20 -07:00
Linus Torvalds
3f9c08f7ce Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "A few fixes for ARM.  Some of these are correctness issues:
   - TLBs must be flushed after the old mappings are removed by the DMA
     mapping code, but before the new mappings are established.
   - An off-by-one entry error in the Keystone LPAE setup code.

  Fixes include:
   - ensuring that the identity mapping for LPAE does not remove the
     kernel image from the identity map.
   - preventing userspace from trapping into kgdb.
   - fixing a preemption issue in the Intel iwmmxt code.
   - fixing a build error with nommu.

  Other changes include:
   - Adding a note about which areas of memory are expected to be
     accessible while the identity mapping tables are in place"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
  ARM: idmap: add identity mapping usage note
  ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
  ARM: fix alignment of keystone page table fixup
  ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled
  ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable()
  ARM: DMA: ensure that old section mappings are flushed from the TLB
2014-08-02 10:57:39 -07:00
Omar Sandoval
6bf755db4d ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-08-02 15:20:30 +01:00
Russell King
c5cc87fa8d ARM: idmap: add identity mapping usage note
Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map.  This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-08-02 15:20:26 +01:00
Linus Torvalds
1f2609460c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "This contains a couple of fixes - one is the aio fix from Christoph,
  the other a fallocate() one from Eric"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: fix check for fallocate on active swapfile
  direct-io: fix AIO regression
2014-08-01 18:01:41 -07:00
Linus Torvalds
f88cf230a4 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Peter Anvin:
 "A single fix to not invoke the espfix code on Xen PV, as it turns out
  to oops the guest when invoked after all.  This patch leaves some
  amount of dead code, in particular unnecessary initialization of the
  espfix stacks when they won't be used, but in the interest of keeping
  the patch minimal that cleanup can wait for the next cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/entry/xen: Do not invoke espfix64 on Xen
2014-08-01 17:37:01 -07:00
Linus Torvalds
ecb679fc2c Staging driver bugfixes for 3.16-rc8
Here are some tiny staging driver bugfixes that I've had in my tree for
 the past week that resolve some reported issues.  Nothing major at all,
 but it would be good to get them merged for 3.16-rc8 or -final.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlPcBekACgkQMUfUDdst+yljIACggnJMs6Hj7yn4Ml5YFws77lTD
 R6AAnR7zzqrw+5pqOpH+ILZRZohmpwwE
 =SM27
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver bugfixes from Greg KH:
 "Here are some tiny staging driver bugfixes that I've had in my tree
  for the past week that resolve some reported issues.  Nothing major at
  all, but it would be good to get them merged for 3.16-rc8 or -final"

* tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vt6655: Fix disassociated messages every 10 seconds
  staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
  staging: rtl8723au: rtw_resume(): release semaphore before exit on error
  iio:bma180: Missing check for frequency fractional part
  iio:bma180: Fix scale factors to report correct acceleration units
  iio: buffer: Fix demux table creation
2014-08-01 17:16:05 -07:00
Linus Torvalds
818be5894e Fix dm bufio shrinker to properly zero-fill all fields.
Fix race in dm cache that caused improper reporting of the number of
 dirty blocks in the cache.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT28K2AAoJEMUj8QotnQNau30IAOAiW683+TnSpq4MXRywUNKl
 gaLtC65Q04KesZ2FtZYaYUOdfF8nsmuyBiSlHl8RLzGWCYJcQQAMdXVEgCZ+qXKX
 OhD85zhySXUamvKfq2wX452kAK6O2eR//Azc3d57uWhGboXTZrqTDc4QLRDJQoAF
 8b4g4r/NCV6fAtHtEfB9JtLyrk1kOUGpHdF2rSFy20IUKs1RPZRZzNYEh5KB9ZWI
 DeoZY6GrqR1bLlOAL3Usd43fYGdgv+Mn1HaR+Xgn7LVl2HpRIts4M8Y54F0xko4F
 T26rJMZlMppolgZBXElrDm8ly6bDPglfU7ymJMleiPEdCLhpLX/jqoPEtgsqGI8=
 =pQe5
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
 "Fix dm bufio shrinker to properly zero-fill all fields.

  Fix race in dm cache that caused improper reporting of the number of
  dirty blocks in the cache"

* tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix race affecting dirty block count
  dm bufio: fully initialize shrinker
2014-08-01 12:50:05 -07:00
Linus Torvalds
9642a1041e ARM: Straggler SoC fix for 3.16
A DT bugfix for Nomadik that had an ambigouos double-inversion of a gpio
 line, and one MAINTAINER URL update that might as well go in now.
 
 We could hold off until the merge window, but then we'll just have to mark
 the DT fix for stable and it just seems like in total causing more work.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJT2dyEAAoJEIwa5zzehBx3rZMQAKwoGghFh2t/4USXOWEDT1ns
 B0cPMvg3wjlOxnJsC4Nvi9BclIBBlUImHoHYG8Un7/aJXkuRpOC36Zg2wWDJUkVC
 NZOwHAj/wBwPTKQnHKSQ2Nyw/h1t1FidlWt07GcFxmYvl00pfE707MHKiRTd/pUY
 T4/VELaUXmcJ2/9WqscNLi4TzuQBd5eBb1n48GRzVsTfjJo7jXExCsuKWNOpbPar
 eDqZjw7KAU73T8d0hJmZtJuI8iZ0I3mMYFrA7Sp1srXEpw/ZKfvn6+SBH3CzZdxz
 oDXZYYBl4wYuqqW7I42esZkcsyGmCE58KS0tYWZgvQUJzrHeyV6myuEn2qt2ROzQ
 Ii5huDBSbxNOk2D+v5JevBG8SuDFMS6Op1Gj9fUB7F/+P+EDzccwSkejPZYAnHDd
 JHWiks6CU4xRzlSrPb4AtQ3UkX0GO1QPlhhT/TQD0wOAn9XMkC9hfYBBn+pnusXm
 l8F+DSGx2UUziwF2f4tTGcpKeiTN71g8xizHFs4LHhctnVeb/G68a5s1Cxsfq7Bd
 tW3Y/b1sy5q/TllKTL6qTSqTS+GPldV/w4IwBm4tJs3Q5lMoJErvFQqTEPpIwcRP
 xDVBT0BpEStvxMNYVhIEE4C50k02zLhqS2CH9pqzHVBFYhhNlp2g23QDsbU2yWgG
 sPg0G+Ts8yxeS1DZotnq
 =sz9W
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM straggler SoC fix from Olof Johansson:
 "A DT bugfix for Nomadik that had an ambigouos double-inversion of a
  gpio line, and one MAINTAINER URL update that might as well go in now.

  We could hold off until the merge window, but then we'll just have to
  mark the DT fix for stable and it just seems like in total causing
  more work"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Update Tegra Git URL
  ARM: nomadik: fix up double inversion in DT
2014-08-01 12:49:02 -07:00
Anssi Hannula
44fa816bb7 dm cache: fix race affecting dirty block count
nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks.  This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.

People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648

Fix this by using an atomic_t for nr_dirty.

Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-08-01 12:25:22 -04:00
Greg Thelen
d8c712ea47 dm bufio: fully initialize shrinker
1d3d4437ea ("vmscan: per-node deferred work") added a flags field to
struct shrinker assuming that all shrinkers were zero filled.  The dm
bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
in flags.  So far the only defined flags bit is SHRINKER_NUMA_AWARE.
But there are proposed patches which add other bits to shrinker.flags
(e.g. memcg awareness).

Rather than simply initializing the shrinker, this patch uses kzalloc()
when allocating the dm_bufio_client to ensure that the embedded shrinker
and any other similar structures are zeroed.

This fixes theoretical over aggressive shrinking of dm bufio objects.
If the uninitialized dm_bufio_client.shrinker.flags contains
SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
each numa node rather than just once.  This has been broken since 3.12.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.12+
2014-08-01 12:07:21 -04:00
Jan Kara
504d58745c timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&ctx->lock);
                               lock(hrtimer_bases.lock);
  lock(&port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-01 12:54:41 +02:00