Commit graph

401547 commits

Author SHA1 Message Date
Ingo Molnar
0e73453e17 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Fix uprobes bugs that happen if fork() is called with pending ret-probes,
from Oleg Nesterov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-31 11:11:35 +01:00
Yunkang Tang
5beea882e6 Input: ALPS - add support for model found on Dell XT2
This patch adds support for touchpad found on Dell XT2. It's a dual device
with device ID: 73, 00, 14, that comply with "ALPS_PROTO_V2".

Signed-off-by: Yunkang Tang <yunkang.tang@cn.alps.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-10-31 00:59:20 -07:00
Dave Airlie
74c85e1357 Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few small fixes for radeon (audio regression fix,
stability fix, and an endian bug noticed by coverity).

* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/dpm: fix incompatible casting on big endian
  drm/radeon: disable bapm on KB
  drm/radeon: use sw CTS/N values for audio on DCE4+
2013-10-31 15:29:10 +10:00
Linus Torvalds
12aee278b5 Merge branch 'akpm' (fixes from Andrew Morton)
Merge three fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
  percpu: fix this_cpu_sub() subtrahend casting for unsigneds
  mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
2013-10-30 14:27:10 -07:00
Greg Thelen
5e8cfc3c75 memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
As of commit 3ea67d06e4 ("memcg: add per cgroup writeback pages
accounting") memcg counter errors are possible when moving charged
memory to a different memcg.  Charge movement occurs when processing
writes to memory.force_empty, moving tasks to a memcg with
memcg.move_charge_at_immigrate=1, or memcg deletion.

An example showing error after memory.force_empty:

  $ cd /sys/fs/cgroup/memory
  $ mkdir x
  $ rm /data/tmp/file
  $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
  [1] 13600
  $ grep ^mapped x/memory.stat
  mapped_file 1048576
  $ echo 13600 > tasks
  $ echo 1 > x/memory.force_empty
  $ grep ^mapped x/memory.stat
  mapped_file 4503599627370496

mapped_file should end with 0.
  4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
  1048576          == 0x10,0000           == 0x100 pages

This issue only affects the source memcg on 64 bit machines; the
destination memcg counters are correct.  So the rmdir case is not too
important because such counters are soon disappearing with the entire
memcg.  But the memcg.force_empty and memory.move_charge_at_immigrate=1
cases are larger problems as the bogus counters are visible for the
(possibly long) remaining life of the source memcg.

The problem is due to memcg use of __this_cpu_from(.., -nr_pages), which
is subtly wrong because it subtracts the unsigned int nr_pages (either
-1 or -512 for THP) from a signed long percpu counter.  When
nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines stat->count[idx]
is signed 64 bit.  So memcg's attempt to simply decrement a count (e.g.
from 1 to 0) boils down to:

  long count = 1
  unsigned int nr_pages = 1
  count += -nr_pages  /* -nr_pages == 0xffff,ffff */
  count is now 0x1,0000,0000 instead of 0

The fix is to subtract the unsigned page count rather than adding its
negation.  This only works once "percpu: fix this_cpu_sub() subtrahend
casting for unsigneds" is applied to fix this_cpu_sub().

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Greg Thelen
bd09d9a351 percpu: fix this_cpu_sub() subtrahend casting for unsigneds
this_cpu_sub() is implemented as negation and addition.

This patch casts the adjustment to the counter type before negation to
sign extend the adjustment.  This helps in cases where the counter type
is wider than an unsigned adjustment.  An alternative to this patch is
to declare such operations unsupported, but it seemed useful to avoid
surprises.

This patch specifically helps the following example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()

Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff.  This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
  long_counter = 0 + 0xffffffff

Also apply the same cast to:
  __this_cpu_sub()
  __this_cpu_sub_return()
  this_cpu_sub_return()

All percpu_test.ko passes, especially the following cases which
previously failed:

  l -= ui_one;
  __this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);

  l -= ui_one;
  this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);
  CHECK(l, long_counter, 0xffffffffffffffff);

  ul -= ui_one;
  __this_cpu_sub(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, -1);
  CHECK(ul, ulong_counter, 0xffffffffffffffff);

  ul = this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 2);

  ul = __this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 1);

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Chen LinX
3017f079ef mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
When walk_page_range walk a memory map's page tables, it'll skip
VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
maybe larger than 'end'.  In next loop, 'addr' will be larger than
'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
access the wrong pte.

  BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
  addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
  CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
  Call Trace:
    dump_stack+0x16/0x18
    print_bad_pte+0x114/0x1b0
    vm_normal_page+0x56/0x60
    pagemap_pte_range+0x17a/0x1d0
    walk_page_range+0x19e/0x2c0
    pagemap_read+0x16e/0x200
    vfs_read+0x84/0x150
    SyS_read+0x4a/0x80
    syscall_call+0x7/0xb

Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
Signed-off-by: Chen LinX <linx.z.chen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>	[3.10.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Russell King
c56b097af2 mm: list_lru: fix almost infinite loop causing effective livelock
I've seen a fair number of issues with kswapd and other processes
appearing to get stuck in v3.12-rc.  Using sysrq-p many times seems to
indicate that it gets stuck somewhere in list_lru_walk_node(), called
from prune_icache_sb() and super_cache_scan().

I never seem to be able to trigger a calltrace for functions above that
point.

So I decided to add the following to super_cache_scan():

    @@ -81,10 +81,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
            inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
            dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
            total_objects = dentries + inodes + fs_objects + 1;
    +printk("%s:%u: %s: dentries %lu inodes %lu total %lu\n", current->comm, current->pid, __func__, dentries, inodes, total_objects);

            /* proportion the scan between the caches */
            dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
            inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
    +printk("%s:%u: %s: dentries %lu inodes %lu\n", current->comm, current->pid, __func__, dentries, inodes);
    +BUG_ON(dentries == 0);
    +BUG_ON(inodes == 0);

            /*
             * prune the dcache first as the icache is pinned by it, then
    @@ -99,7 +103,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
                    freed += sb->s_op->free_cached_objects(sb, fs_objects,
                                                           sc->nid);
            }
    -
    +printk("%s:%u: %s: dentries %lu inodes %lu freed %lu\n", current->comm, current->pid, __func__, dentries, inodes, freed);
            drop_super(sb);
            return freed;
     }

and shortly thereafter, having applied some pressure, I got this:

    update-apt-xapi:1616: super_cache_scan: dentries 25632 inodes 2 total 25635
    update-apt-xapi:1616: super_cache_scan: dentries 1023 inodes 0
    ------------[ cut here ]------------
    Kernel BUG at c0101994 [verbose debug info unavailable]
    Internal error: Oops - BUG: 0 [#3] SMP ARM
    Modules linked in: fuse rfcomm bnep bluetooth hid_cypress
    CPU: 0 PID: 1616 Comm: update-apt-xapi Tainted: G      D      3.12.0-rc7+ #154
    task: daea1200 ti: c3bf8000 task.ti: c3bf8000
    PC is at super_cache_scan+0x1c0/0x278
    LR is at trace_hardirqs_on+0x14/0x18
    Process update-apt-xapi (pid: 1616, stack limit = 0xc3bf8240)
    ...
    Backtrace:
      (super_cache_scan) from [<c00cd69c>] (shrink_slab+0x254/0x4c8)
      (shrink_slab) from [<c00d09a0>] (try_to_free_pages+0x3a0/0x5e0)
      (try_to_free_pages) from [<c00c59cc>] (__alloc_pages_nodemask+0x5)
      (__alloc_pages_nodemask) from [<c00e07c0>] (__pte_alloc+0x2c/0x13)
      (__pte_alloc) from [<c00e3a70>] (handle_mm_fault+0x84c/0x914)
      (handle_mm_fault) from [<c001a4cc>] (do_page_fault+0x1f0/0x3bc)
      (do_page_fault) from [<c001a7b0>] (do_translation_fault+0xac/0xb8)
      (do_translation_fault) from [<c000840c>] (do_DataAbort+0x38/0xa0)
      (do_DataAbort) from [<c00133f8>] (__dabt_usr+0x38/0x40)

Notice that we had a very low number of inodes, which were reduced to
zero my mult_frac().

Now, prune_icache_sb() calls list_lru_walk_node() passing that number of
inodes (0) into that as the number of objects to scan:

    long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
                         int nid)
    {
            LIST_HEAD(freeable);
            long freed;

            freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
                                           &freeable, &nr_to_scan);

which does:

    unsigned long
    list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
                       void *cb_arg, unsigned long *nr_to_walk)
    {

            struct list_lru_node    *nlru = &lru->node[nid];
            struct list_head *item, *n;
            unsigned long isolated = 0;

            spin_lock(&nlru->lock);
    restart:
            list_for_each_safe(item, n, &nlru->list) {
                    enum lru_status ret;

                    /*
                     * decrement nr_to_walk first so that we don't livelock if we
                     * get stuck on large numbesr of LRU_RETRY items
                     */
                    if (--(*nr_to_walk) == 0)
                            break;

So, if *nr_to_walk was zero when this function was entered, that means
we're wanting to operate on (~0UL)+1 objects - which might as well be
infinite.

Clearly this is not correct behaviour.  If we think about the behaviour
of this function when *nr_to_walk is 1, then clearly it's wrong - we
decrement first and then test for zero - which results in us doing
nothing at all.  A post-decrement would give the desired behaviour -
we'd try to walk one object and one object only if *nr_to_walk were one.

It also gives the correct behaviour for zero - we exit at this point.

Fixes: 5cedf721a7 ("list_lru: fix broken LRU_RETRY behaviour")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
[ Modified to make sure we never underflow the count: this function gets
  called in a loop, so the 0 -> ~0ul transition is dangerous  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:57:46 -07:00
Linus Torvalds
ced5d6b552 Serial fixes for 3.12-final
Here are 3 tiny fixes that are needed for 3.12-final for some serial
 drivers.  One of them is a revert of a broken patch, and two others are
 fixes for reported bugs.  All of these have been in linux-next for a
 while, I forgot I had not sent them to you yet, my fault.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJxUPwACgkQMUfUDdst+ymzGgCfSmp/uQAOLnEKJ2WKfElDcbZM
 nhkAnRivp6JzwmFLVEIXORMcmLhyc8Nn
 =FOvb
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial fixes from Greg KH:
 "Here are 3 tiny fixes that are needed for 3.12-final for some serial
  drivers.

  One of them is a revert of a broken patch, and two others are fixes
  for reported bugs.  All of these have been in linux-next for a while,
  I forgot I had not sent them to you yet, my fault"

(Actually, Greg, you _had_ sent two of the three, so this pulls in just
one actual new fix)

* tag 'tty-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty/serial: at91: fix uart/usart selection for older products
2013-10-30 12:29:06 -07:00
Linus Torvalds
b8cab70665 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Mainly Intel regression fixes and quirks, along with a simple one
  liner to fix rendernodes ioctl access (off by default, but testers
  want to test it)"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm: allow DRM_IOCTL_VERSION on render-nodes
  drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
  drm/i915: No LVDS hardware on Intel D410PT and D425KT
  drm/i915/dp: workaround BIOS eDP bpp clamping issue
  drm/i915: Add HSW CRT output readout support
  drm/i915: Add support for pipe_bpp readout
2013-10-30 12:27:12 -07:00
Linus Torvalds
182b4fd9f3 sound fixes for 3.12-final
A few small HD-audio regression fixes, mostly for stable kernels, too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJScOAKAAoJEGwxgFQ9KSmknI8P/18upoLMoGn+zJYkfTFPkpXY
 X59FdVi9C/7wb+LQz9geHzDI4npZyLCVp2HUfwxCw12//MtB4D/zAxEC+1eyqjyK
 V45IUyBwly54+xQQdFdT1CkDmx6sW0Q+yfy38+KJR5IDPHSXw6IJg3pNOKJu5/Gh
 YB7FjnUdy2BLaIp1FDGS6NVkDfkSZZSTGvh6e1YXXennf1fhy0AalRx2y7091OhW
 hPjBMCatPeiiaUbTeduYrC0/6NjENNjsM/C2j489R47JtYELihwm2DHV+RxVQMGC
 kPmHXGIvqTZaIgX5mIrAlUfWAojx0W7Bt8I0L768ofg8NqF5x1Nohtnsq8Vmg4jz
 K8C7g/0AIR58bD4Q2eavVEMevlpeAaF79pTwFvShProPrposE+sv/0a38u2VarxS
 fBc8DgnBtZzYbaZRG/0gHEXS2WzOkaP7IMv/0B8F5W+cOQfaJCUUWZPv86ebGIcY
 0y4q0rDfQ58Wq9p3QEr1Z7E38gusmvU1pLteWZ+3jw661uy3prOiJRX+O3XTrBk5
 6ZABBrsMLkiNqxOeNZtFu0Gjdyq4KbNV/LSMUFz0X5E+JGGqZndvl1/AJZM6pppi
 dB28rAYwXD0TF9RvEZsm9uxPKKdFk+lslfTwZP1xJQimHR0WkS6chU1DjYp+2eWL
 7ug1CEQAAFd4ZdEnp+dL
 =HqzC
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A few small HD-audio regression fixes, mostly for stable kernels, too"

* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix silent headphone on Thinkpads with AD1984A codec
  ALSA: hda - Add missing initial vmaster hook at build_controls callback
  ALSA: hda - Fix unbalanced runtime PM refcount after S3/S4
2013-10-30 12:26:29 -07:00
Linus Torvalds
96d33b086b Fixes for the 3.12 debugfs problem---removing the duplicate directory
name, and using a better the error code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJScOswAAoJEBvWZb6bTYbyCO8P/RoJy4EgsEjv9Qxfr+wfHXX0
 QOz/7fqEsFFMDh1Fiyjvg9uNYj5jQO9lzQfX0ea0uc4svIMUBWaAlyQVL/U0igkp
 2DrIoRh1aNa9Lt4RDsGswapreJylGgyhADa2W9r1tlPna7dlaPCEHGxZBR112KJc
 R9t0PMIq7NBGvPGBhiOkjeDVlqWUrEQvxOQW8ZQZw2pBc54DSreY9nq3n1dIhqMp
 gxpAKi3h5YWK8lzFxFSOLKnRqC+5SAUj2Bk8g37ZESJXp7bpmn74DpuK4KiLLokP
 V3rUvHO1p0eZSx0QWuV9QrPV6Q1VVuhLEinWnCdejhXUqk/tCR5EOMKMcHIk2xad
 6ApE7a4UPs9VmXp5Ry7/k6lorOrBuvwArPwJOCpCN7zAN1gEmmfFg3KbFdtbaIgx
 g4hEtqWQ6VULBxyDXCFHYbXGLS22kgWoQCR3ToGqbjzuZmZph49NbLjlt6d7o6w0
 n+UMZqcpLA9eUhq4Yi0IAGe45pizm5rdC4xRpbH7bK7WmKywYw+zf+JSOheTLwtm
 ER0sTmcyMSP68ysnJCLgh4hYf1lxoaQ/8SEdh331VIrThmHYyWGZD4Ol9YKg5xwU
 Yzd3wedxYYIiJ1vFfZykMWUT9zLU3feMDvq2s2maqg+uV/b4uPfZhwnFLBU+PZKC
 cxkXsb/7oZ//tmGF1MEm
 =GwaF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Fixes for the 3.12 debugfs problem - removing the duplicate directory
  name, and using a better the error code"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: use a more sensible error number when debugfs directory creation fails
  KVM: Fix modprobe failure for kvm_intel/kvm_amd
2013-10-30 12:25:15 -07:00
Dan Carpenter
a8b33654b1 Staging: sb105x: info leak in mp_get_count()
The icount.reserved[] array isn't initialized so it leaks stack
information to userspace.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:50 -07:00
Dan Carpenter
8d1e72250c Staging: bcm: info leak in ioctl
The DevInfo.u32Reserved[] array isn't initialized so it leaks kernel
information to user space.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
b5e2f33986 staging: wlags49_h2: buffer overflow setting station name
We need to check the length parameter before doing the memcpy().  I've
actually changed it to strlcpy() as well so that it's NUL terminated.

You need CAP_NET_ADMIN to trigger these so it's not the end of the
world.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
f856567b93 aacraid: missing capable() check in compat ioctl
In commit d496f94d22 ('[SCSI] aacraid: fix security weakness') we
added a check on CAP_SYS_RAWIO to the ioctl.  The compat ioctls need the
check as well.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
c2c65cd2e1 staging: ozwpan: prevent overflow in oz_cdev_write()
We need to check "count" so we don't overflow the ei->data buffer.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
201f99f170 uml: check length in exitcode_proc_write()
We don't cap the size of buffer from the user so we could write past the
end of the array here.  Only root can write to this file.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Takashi Iwai
c4a4ddaefb ASoC: Fixes for v3.12
A few of the Coverity fixes from Takashi, one of which (the wm_hubs one)
 is particularly noticable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJScUA0AAoJELSic+t+oim9Yu8P/Rve++VdhWT/NPrhRvHLhnt7
 CXaXdAKt2tS8mZCrvho/G2b22ZGJML1RseCF4ORIPYaLAnk86G6mqNldFWGiBy1D
 ainPIQKahc0Pxu9TlFuj27r6M+S/t3JXxgX9R8rJr8yz4JRVzCucrRMlrFPXXPNW
 PDu9djCf6Mv3X9M/tpnZUQo2gpkmfzWmrF6/TSb/91TT7ZhYouzC+pkydHvyvXU7
 Ah3MhNPe7rXXNZsdxn8jSPwN4g+vHgkLYg+OrrCtfOppG0XLQPzmq1xWEeqSltLi
 hiVv36VHzzCAymqn29B8kUidRlYcRQiQliv50B4zUCKj78LsEr6/bO3PLpiJkmtm
 DQTckVHT6i2teQdFsBoJ9GDZhkIphk3wVFivG47DhmAK7W1vB2pX67DF4lKrsz8w
 wSwtcUj9lPVoEK/stxTEAU+UNvIrwhbEKxwDi9F4K4eQkaf4CtmTq37cE2QR4uFY
 lxqqYU8oCUypzCEtWXZ28N2+BBCoH+Fjw9LgM0eIYrw6UgmtmhY6Az05/T9ZjagP
 PYC9I+ootacB2ApBF6E2wM+3zT7tookNCa2S3pd0kYrYXXqo0P79h2JUyfhQwCHn
 CY29c0YassHjL5lTv0rBW9lbNAZg8wFAMsguOMqXFi/msFKhcpH2aLZ5+R7mUl0s
 QPIzGI3OMMFMx6PDQ3KS
 =J0sP
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v3.12

A few of the Coverity fixes from Takashi, one of which (the wm_hubs one)
is particularly noticable.
2013-10-30 18:42:13 +01:00
Mark Brown
8723b795aa Merge remote-tracking branch 'asoc/fix/wm8994' into asoc-linus 2013-10-30 10:11:55 -07:00
Takashi Iwai
268ff14525 ASoC: wm_hubs: Add missing break in hp_supply_event()
Spotted by coverity CID 115170.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@vger.kernel.org
2013-10-30 09:35:22 -07:00
Markos Chandras
13b7ea6377 MIPS: malta: Fix GIC interrupt offsets
The GIC interrupt offsets are calculated based on the value of NR_CPUS.
However, this is wrong because NR_CPUS may or may not contain the real
number of the actual cpus present in the system. We fix that by using
the 'nr_cpu_ids' variable which contains the real number of cpus in
the system. Previously, an MT core (eg with 8 VPEs) will fail to boot if
NR_CPUS was > 8 with the following errors:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/irq/chip.c:670 __irq_set_handler+0x15c/0x164()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W    3.12.0-rc5-00087-gced5633 5
Stack : 00000006 00000004 00000000 00000000 00000000 00000000 807a4f36 00000053
          807a0000 00000000 80173218 80565aa8 00000000 00000000 00000000 0000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000
          00000000 00000000 00000000 8054fd00 8054fd94 80500514 805657a7 8016eb4
          807a0000 80500514 00000000 00000000 80565aa8 8079a5d8 80565766 8054fd0
          ...
Call Trace:
[<801098c0>] show_stack+0x64/0x7c
[<8049c6b0>] dump_stack+0x64/0x84
[<8012efc4>] warn_slowpath_common+0x84/0xb4
[<8012f00c>] warn_slowpath_null+0x18/0x24
[<80173218>] __irq_set_handler+0x15c/0x164
[<80587cf4>] arch_init_ipiirq+0x2c/0x3c
[<805880c8>] arch_init_irq+0x3c4/0x4bc
[<80588e28>] init_IRQ+0x3c/0x50
[<805847e8>] start_kernel+0x230/0x3d8

---[ end trace 4eaa2a86a8e2da26 ]---

This is now fixed and the Malta board can boot with any NR_CPUS value
which also helps supporting more processors in a single kernel binary.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6091/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-10-30 15:43:18 +01:00
Mika Westerberg
ab1225901d Revert "ACPI / hotplug / PCI: Avoid doing too much for spurious notifies"
Commit 2dc4128 (ACPI / hotplug / PCI: Avoid doing too much for
spurious notifies) changed the enable_slot() to check return value of
pci_scan_slot() and if it is zero return early from the function. It
means that there were no new devices in this particular slot.

However, if a device appeared deeper in the hierarchy the code now
ignores it causing things like Thunderbolt chaining fail to recognize
new devices.

The problem with Alex Williamson's machine was solved with commit
a47d8c8 (ACPI / hotplug / PCI: Avoid parent bus rescans on spurious
device checks) and hence we should be able to restore the original
functionality that we always rescan on bus check notification.

On a device check notification we still check what acpiphp_rescan_slot()
returns and on zero bail out early.

Fixes: 2dc41281b1 (ACPI / hotplug / PCI: Avoid doing too much for spurious notifies)
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-30 15:28:52 +01:00
Rafael J. Wysocki
59612d1879 Revert "select: use freezable blocking call"
This reverts commit 9745cdb36d (select: use freezable blocking call)
that triggers problems during resume from suspend to RAM on Paul Bolle's
32-bit x86 machines.  Paul says:

  Ever since I tried running (release candidates of) v3.11 on the two
  working i686s I still have lying around I ran into issues on resuming
  from suspend. Reverting 9745cdb36d (select: use freezable blocking
  call) resolves those issues.

  Resuming from suspend on i686 on (release candidates of) v3.11 and
  later triggers issues like:

  traps: systemd[1] general protection ip:b738e490 sp:bf882fc0 error:0 in libc-2.16.so[b731c000+1b0000]

  and

  traps: rtkit-daemon[552] general protection ip:804d6e5 sp:b6cb32f0 error:0 in rtkit-daemon[8048000+d000]

  Once I hit the systemd error I can only get out of the mess that the
  system is at that point by power cycling it.

Since we are reverting another freezer-related change causing similar
problems to happen, this one should be reverted as well.

References: https://lkml.org/lkml/2013/10/29/583
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Fixes: 9745cdb36d (select: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:28:35 +01:00
Rafael J. Wysocki
c511851de1 Revert "epoll: use freezable blocking call"
This reverts commit 1c441e9212 (epoll: use freezable blocking call)
which is reported to cause user space memory corruption to happen
after suspend to RAM.

Since it appears to be extremely difficult to root cause this
problem, it is best to revert the offending commit and try to address
the original issue in a better way later.

References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Reported-by: Natrio <natrio@list.ru>
Reported-by: Jeff Pohlmeyer <yetanothergeek@gmail.com>
Bisected-by: Leo Wolf <jclw@ymail.com>
Fixes: 1c441e9212 (epoll: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:27:53 +01:00
Takashi Iwai
6fc16e58ad ALSA: hda - Add a fixup for ASUS N76VZ
ASUS N76VZ needs the same fixup as N56VZ for supporting the boost
speaker.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=846529
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-10-30 12:31:35 +01:00
Paolo Bonzini
0c8eb04a62 KVM: use a more sensible error number when debugfs directory creation fails
I don't know if this was due to cut and paste, or somebody was really
using a D20 to pick the error code for kvm_init_debugfs as suggested by
Linus (EFAULT is 14, so the possibility cannot be entirely ruled out).

In any case, this patch fixes it.

Reported-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:15:34 +01:00
Tim Gardner
d780a31271 KVM: Fix modprobe failure for kvm_intel/kvm_amd
The x86 specific kvm init creates a new conflicting
debugfs directory which causes modprobe issues
with kvm_intel and kvm_amd. For example,

sudo modprobe kvm_amd
modprobe: ERROR: could not insert 'kvm_amd': Bad address

The simplest fix is to just rename the directory. The following
KVM config options are set:

CONFIG_KVM_GUEST=y
CONFIG_KVM_DEBUG_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_DEVICE_ASSIGNMENT=y

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Change debugfs directory name. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:10:42 +01:00
David Herrmann
3d3b78c06c drm: allow DRM_IOCTL_VERSION on render-nodes
DRM_IOCTL_VERSION is a reliable way to get the driver-name and version
information. It's not related to the interface-version (SET_VERSION ioctl)
so we can safely enable it on render-nodes.

Note that gbm uses udev-BUSID to load the correct mesa driver. However,
the VERSION ioctl should be the more reliable way to do this (in case we
add new DRM-bus drivers which have no BUSID or similar).

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-30 14:41:56 +10:00
Dave Airlie
2f2632ff6e Merge tag 'drm-intel-fixes-2013-10-29' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Regression and warn fixes for i915.

* tag 'drm-intel-fixes-2013-10-29' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
  drm/i915: No LVDS hardware on Intel D410PT and D425KT
  drm/i915/dp: workaround BIOS eDP bpp clamping issue
  drm/i915: Add HSW CRT output readout support
  drm/i915: Add support for pipe_bpp readout
2013-10-30 12:30:12 +10:00
Deng-Cheng Zhu
7f081f1755 MIPS: Perf: Fix 74K cache map
According to Software User's Manual, the event of last-level-cache
read/write misses is mapped to even counters. Odd counters of that
event number count miss cycles.

Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6036/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-10-29 21:18:23 +01:00
Linus Torvalds
7314e613d5 Fix a few incorrectly checked [io_]remap_pfn_range() calls
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
really should use the vm_iomap_memory() helper.  This trivially converts
two of them to the helper, and comments about why the third one really
needs to continue to use remap_pfn_range(), and adds the missing size
check.

Reported-by: Nico Golde <nico@ngolde.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.
2013-10-29 10:21:34 -07:00
Oleg Nesterov
3ab6796617 uprobes: Teach uprobe_copy_process() to handle CLONE_VFORK
uprobe_copy_process() does nothing if the child shares ->mm with
the forking process, but there is a special case: CLONE_VFORK.
In this case it would be more correct to do dup_utask() but avoid
dup_xol(). This is not that important, the child should not unwind
its stack too much, this can corrupt the parent's stack, but at
least we need this to allow to ret-probe __vfork() itself.

Note: in theory, it would be better to check task_pt_regs(p)->sp
instead of CLONE_VFORK, we need to dup_utask() if and only if the
child can return from the function called by the parent. But this
needs the arch-dependant helper, and I think that nobody actually
does clone(same_stack, CLONE_VM).

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-10-29 18:02:55 +01:00
Oleg Nesterov
aa59c53fd4 uprobes: Change uprobe_copy_process() to dup xol_area
This finally fixes the serious bug in uretprobes: a forked child
crashes if the parent called fork() with the pending ret probe.

Trivial test-case:

	# perf probe -x /lib/libc.so.6 __fork%return
	# perf record -e probe_libc:__fork perl -le 'fork || print "OK"'

(the child doesn't print "OK", it is killed by SIGSEGV)

If the child returns from the probed function it actually returns
to trampoline_vaddr, because it got the copy of parent's stack
mangled by prepare_uretprobe() when the parent entered this func.

It crashes because a) this address is not mapped and b) until the
previous change it doesn't have the proper->return_instances info.

This means that uprobe_copy_process() has to create xol_area which
has the trampoline slot, and its vaddr should be equal to parent's
xol_area->vaddr.

Unfortunately, uprobe_copy_process() can not simply do
__create_xol_area(child, xol_area->vaddr). This could actually work
but perf_event_mmap() doesn't expect the usage of foreign ->mm. So
we offload this to task_work_run(), and pass the argument via not
yet used utask->vaddr.

We know that this vaddr is fine for install_special_mapping(), the
necessary hole was recently "created" by dup_mmap() which skips the
parent's VM_DONTCOPY area, and nobody else could use the new mm.

Unfortunately, this also means that we can not handle the errors
properly, we obviously can not abort the already completed fork().
So we simply print the warning if GFP_KERNEL allocation (the only
possible reason) fails.

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:54 +01:00
Oleg Nesterov
248d3a7b2f uprobes: Change uprobe_copy_process() to dup return_instances
uprobe_copy_process() assumes that the new child doesn't need
->utask, it should be allocated by demand.

But this is not true if the forking task has the pending ret-
probes, the child should report them as well and thus it needs
the copy of parent's ->return_instances chain. Otherwise the
child crashes when it returns from the probed function.

Alternatively we could cleanup the child's stack, but this needs
per-arch changes and this is not what we want. At least systemtap
expects a .return in the child too.

Note: this change alone doesn't fix the problem, see the next
change.

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:53 +01:00
Oleg Nesterov
af0d95af79 uprobes: Teach __create_xol_area() to accept the predefined vaddr
Currently xol_add_vma() uses get_unmapped_area() for area->vaddr,
but the next patches need to use the fixed address. So this patch
adds the new "vaddr" argument to __create_xol_area() which should
be used as area->vaddr if it is nonzero.

xol_add_vma() doesn't bother to verify that the predefined addr is
not used, insert_vm_struct() should fail if find_vma_links() detects
the overlap with the existing vma.

Also, __create_xol_area() doesn't need __GFP_ZERO to allocate area.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:51 +01:00
Oleg Nesterov
6441ec8b7c uprobes: Introduce __create_xol_area()
No functional changes, preparation.

Extract the code which actually allocates/installs the new area
into the new helper, __create_xol_area().

While at it remove the unnecessary "ret = ENOMEM" and "ret = 0"
in xol_add_vma(), they both have no effect.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:50 +01:00
Oleg Nesterov
b68e074910 uprobes: Change the callsite of uprobe_copy_process()
Preparation for the next patches.

Move the callsite of uprobe_copy_process() in copy_process() down
to the succesfull return. We do not care if copy_process() fails,
uprobe_free_utask() won't be called in this case so the wrong
->utask != NULL doesn't matter.

OTOH, with this change we know that copy_process() can't fail when
uprobe_copy_process() is called, the new task should either return
to user-mode or call do_exit(). This way uprobe_copy_process() can:

	1. setup p->utask != NULL if necessary

	2. setup uprobes_state.xol_area

	3. use task_work_add(p)

Also, move the definition of uprobe_copy_process() down so that it
can see get_utask().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:48 +01:00
Ralf Baechle
c2d3f25dda uprobes: Remove the wrong __weak attribute
linux/uprobes.h declares arch_uprobe_skip_sstep() as a weak function.
But as there is no definition of generic version so when trying to build
uprobes for an architecture that doesn't yet have a arch_uprobe_skip_sstep()
implementation, the vmlinux will try to call arch_uprobe_skip_sstep()
somehwere in Stupidhistan leading to a system crash.  We rather want a
proper link error so remove arch_uprobe_skip_sstep().

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-10-29 18:02:47 +01:00
Linus Torvalds
f9ec2e6f79 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
 "This contains five tooling fixes:

   - fix a remaining mmap2 assumption which resulted in perf top output
     breakage
   - fix mmap ring-buffer processing bug that corrupts data
   - fix for a severe python scripting memory leak
   - fix broken (and user-visible) -g option handling
   - fix stdio output

  The diffstat size is larger than what we'd like to see this late :-/"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fixup mmap event consumption
  perf top: Split -G and --call-graph
  perf record: Split -g and --call-graph
  perf hists: Add color overhead for stdio output buffer
  perf tools: Fix up /proc/PID/maps parsing
  perf script python: Fix mem leak due to missing Py_DECREFs on dict entries
2013-10-29 08:36:50 -07:00
Linus Torvalds
2a999aa0a1 Kconfig: make KOBJECT_RELEASE debugging require timer debugging
Without the timer debugging, the delayed kobject release will just
result in undebuggable oopses if it triggers any latent bugs.  That
doesn't actually help debugging at all.

So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid
having people enable one without the other.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-29 08:33:36 -07:00
Peter Zijlstra
5a3126d4fe perf: Fix the perf context switch optimization
Currently we only optimize the context switch between two
contexts that have the same parent; this forgoes the
optimization between parent and child context, even though these
contexts could be equivalent too.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Shishkin, Alexander <alexander.shishkin@intel.com>
Link: http://lkml.kernel.org/r/20131007164257.GH3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 14:13:01 +01:00
Daniel Vetter
1fbc0d789d drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
Originally I've thought that this is leftover hw state dirt from the
BIOS. But after way too much helpless flailing around on my part I've
noticed that the actual bug is when we change the state of an already
active pipe.

For example when we change the fdi lines from 2 to 3 without switching
off outputs in-between we'll never see the crucial on->off transition
in the ->modeset_global_resources hook the current logic relies on.

Patch version 2 got this right by instead also checking whether the
pipe is indeed active. But that in turn broke things when pipes have
been turned off through dpms since the bifurcate enabling is done in
the ->crtc_mode_set callback.

To address this issues discussed with Ville in the patch review move
the setting of the bifurcate bit into the ->crtc_enable hook. That way
we won't wreak havoc with this state when userspace puts all other
outputs into dpms off state. This also moves us forward with our
overall goal to unify the modeset and dpms on paths (which we need to
have to allow runtime pm in the dpms off state).

Unfortunately this requires us to move the bifurcate helpers around a
bit.

Also update the commit message, I've misanalyzed the bug rather badly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70507
Tested-by: Jan-Michael Brummer <jan.brummer@tabos.org>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-29 13:52:56 +01:00
Peter Zijlstra
e00b12e64b perf/x86: Further optimize copy_from_user_nmi()
Now that we can deal with nested NMI due to IRET re-enabling NMIs and
can deal with faults from NMI by making sure we preserve CR2 over NMIs
we can in fact simply access user-space memory from NMI context.

So rewrite copy_from_user_nmi() to use __copy_from_user_inatomic() and
rework the fault path to do the minimal required work before taking
the in_atomic() fault handler.

In particular avoid perf_sw_event() which would make perf recurse on
itself (it should be harmless as our recursion protections should be
able to deal with this -- but why tempt fate).

Also rename notify_page_fault() to kprobes_fault() as that is a much
better name; there is no notifier in it and its specific to kprobes.

Don measured that his worst case NMI path shrunk from ~300K cycles to
~150K cycles.

Cc: Stephane Eranian <eranian@google.com>
Cc: jmario@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: dave.hansen@linux.intel.com
Tested-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131024105206.GM2490@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:54 +01:00
Peter Zijlstra
2c42cfbfe1 perf: Change zero-padding of strings in perf_event_mmap_event()
Oleg complained about the excessive 0-ing in perf_event_mmap_event(),
so try and be smarter about it while keeping it fairly fool proof and
avoid leaking random bits out to userspace.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8jirlm99m6if2z13wd6rbyu6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:53 +01:00
Oleg Nesterov
3ea2f2b96f perf: Do not waste PAGE_SIZE bytes for ALIGN(8) in perf_event_mmap_event()
perf_event_mmap_event() does kzalloc(PATH_MAX + sizeof(u64)) to
ensure we can align the size later. However this means that we
actually allocate PAGE_SIZE * 2 buffer, seems too much.

Change this code to allocate PATH_MAX==PAGE_SIZE bytes, but tell
d_path() to not use the last sizeof(u64) bytes.

Note: it is not clear why do we need __GFP_ZERO, see the next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016201004.GC23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:52 +01:00
Oleg Nesterov
32c5fb7e7d perf: Kill the dead !vma->vm_mm code in perf_event_mmap_event()
1. perf_event_mmap(vma) is never called with a gate_vma-like arg,
   remove the "if (!vma->vm_mm)" code.

2. arch_vma_name() can use the chached value of mmap_event->vma.

3. Change the code to not call arch_vma_name() twice.

4. Purely cosmetic, but since we use "goto got_name" all the time
   remove "else" from "[stack]" branch just for symmetry.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016200945.GB23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:51 +01:00
Peter Zijlstra
d9494cb429 perf: Remove useless atomic_t
There's nothing atomic about atomic_set vs atomic_read; so remove the
atomic_t usage.

Also, make running_sample_length static as it really is (and should
be) local to this translation unit.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/n/tip-vw9lg588x1ic248whybjon0c@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:49 +01:00
Peter Zijlstra
e8a923cc1f perf/x86: Fix NMI measurements
OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.

What triggered it is that my WSM-EP is broken :-(

  [    0.001000] tsc: Fast TSC calibration using PIT
  [    0.002000] tsc: Detected 2533.715 MHz processor
  [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
  [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
  [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed

For some reason it consistently detects TSC skew, even though NHM+
should have a single clock domain for 'reasonable' systems.

This marks sched_clock_stable=0, which means that we do fancy stuff to
try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
it either wakes up or gets kicked by another cpu.

While this is perfectly fine for the scheduler -- it only cares about
actually running stuff, and when we're running stuff we're obviously not
idle. This does somewhat break down for perf which can trigger events
just fine on an otherwise idle cpu.

So I've got NMIs get get 'measured' as taking ~1ms, which actually
don't last nearly that long:

          <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
  ...
          <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990

So ftrace (which uses sched_clock(), not the fancy bits) only sees
~27us, but we measure ~1ms !!

Now since all this measurement stuff lives in x86 code, we can actually
fix it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:20 +01:00
Peter Zijlstra
bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00