Commit graph

810570 commits

Author SHA1 Message Date
Kailang Yang
4d4b0c52bd ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225
Forgot to add unplug function to unplug state of headset mode
for ALC225.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-01-09 10:21:57 +01:00
Ingo Molnar
576b50ea23 perf/core fixes and improvements:
perf top:
 
   Arnaldo Carvalho de Melo:
 
   . Lift restriction on using callchains without "sym" in --sort, allowing
 
 perf trace:
 
   Arnaldo Carvalho de Melo:
 
   . Fix ')' placement in "interrupted" syscall lines.
 
   . Fix alignment for [continued] lines.
 
 perf tests:
 
   Florian Fainelli:
 
   . Add a test for the ARM 32-bit [vectors] page.
 
 tools lib traceevent:
 
   Tzvetomir Stoyanov:
 
   . Introduce new libtracevent API: tep_override_comm().
 
   . Initialize host_bigendian at tep_handle allocation.
 
   . More namespacing changes.
 
   . Remove superfluous APIs.
 
 tools headers uapi:
 
   Arnaldo Carvalho de Melo:
 
   . Update linux/{fs,vhost}.h, grab a copy o linux/mount.h, where the
     MS_ mount flags were moved.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXDT1GAAKCRCyPKLppCJ+
 JxpIAQD4gwKryQB3RwDIhznIjpaR1voHNPyEdSiLC5mKqLI8aQD/TQUr472WFjDB
 6rJWflAd8IjVD4PKgNNlIAcMWMApPgA=
 =rsPE
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.0-20190108' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

perf top:

  Arnaldo Carvalho de Melo:

  - Lift restriction on using callchains without "sym" in --sort

perf trace:

  Arnaldo Carvalho de Melo:

  - Fix ')' placement in "interrupted" syscall lines.

  - Fix alignment for [continued] lines.

perf tests:

  Florian Fainelli:

  - Add a test for the ARM 32-bit [vectors] page.

tools lib traceevent:

  Tzvetomir Stoyanov:

  - Introduce new libtracevent API: tep_override_comm().
  - Initialize host_bigendian at tep_handle allocation.
  - More namespacing changes.
  - Remove superfluous APIs.

tools headers uapi:

  Arnaldo Carvalho de Melo:

  . Update linux/{fs,vhost}.h, grab a copy o linux/mount.h, where the
    MS_ mount flags were moved.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-09 07:59:40 +01:00
Zhenyu Wang
f0e9943725 drm/i915/gvt: Fix workload request allocation before request add
In commit 6bb2a2af8b ("drm/i915/gvt: Fix crash after request->hw_context change"),
forgot to handle workload scan path in ELSP handler case which was to
optimize scanning earlier instead of in gvt submission thread, so request
alloc and add was splitting then which is against right process.

This trys to do a partial revert of that commit which still has workload
request alloc helper and make sure shadow state population is handled after
request alloc for target state buffer.

v3: Fix missed workload status setting in request alloc error path
v2: Fix dispatch workload err path that should add request after alloc anyway.

Fixes: 6bb2a2af8b ("drm/i915/gvt: Fix crash after request->hw_context change")
Cc: Bin Yang <bin.yang@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Bin Yang <bin.yang@intel.com>
Reviewed-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2019-01-09 12:59:09 +08:00
Jeff Moyer
40405851af block: clarify documentation for blk_{start|finish}_plug
There was some confusion about what these functions did.  Make it clear
that this is a hint for upper layers to pass to the block layer, and
that it does not guarantee that I/O will not be submitted between a
start and finish plug.

Reported-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-08 20:49:46 -07:00
Linus Torvalds
a88cc8da02 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: do not wake kswapd with zone lock held
  hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization"
  hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race"
  mm: page_mapped: don't assume compound page is huge or THP
  mm/memory.c: initialise mmu_notifier_range correctly
  tools/vm/page_owner: use page_owner_sort in the use example
  kasan: fix krealloc handling for tag-based mode
  kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY
  kasan, arm64: use ARCH_SLAB_MINALIGN instead of manual aligning
  mm, memcg: fix reclaim deadlock with writeback
  mm/usercopy.c: no check page span for stack objects
  slab: alien caches must not be initialized if the allocation of the alien cache failed
  fork, memcg: fix cached_stacks case
  zram: idle writeback fixes and cleanup
2019-01-08 18:58:29 -08:00
Stafford Horne
9cb2feb4d2 arch/openrisc: Fix issues with access_ok()
The commit 594cc251fd ("make 'user_access_begin()' do 'access_ok()'")
exposed incorrect implementations of access_ok() macro in several
architectures.  This change fixes 2 issues found in OpenRISC.

OpenRISC was not properly using parenthesis for arguments and also using
arguments twice.  This patch fixes those 2 issues.

I test booted this patch with v5.0-rc1 on qemu and it's working fine.

Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 18:22:30 -08:00
Mel Gorman
73444bc4d8 mm, page_alloc: do not wake kswapd with zone lock held
syzbot reported the following regression in the latest merge window and
it was confirmed by Qian Cai that a similar bug was visible from a
different context.

  ======================================================
  WARNING: possible circular locking dependency detected
  4.20.0+ #297 Not tainted
  ------------------------------------------------------
  syz-executor0/8529 is trying to acquire lock:
  000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
  __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120

  but task is already holding lock:
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
  include/linux/spinlock.h:329 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
  mm/page_alloc.c:2548 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
  mm/page_alloc.c:3021 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
  mm/page_alloc.c:3050 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
  mm/page_alloc.c:3072 [inline]
  000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
  get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491

It appears to be a false positive in that the only way the lock ordering
should be inverted is if kswapd is waking itself and the wakeup
allocates debugging objects which should already be allocated if it's
kswapd doing the waking.  Nevertheless, the possibility exists and so
it's best to avoid the problem.

This patch flags a zone as needing a kswapd using the, surprisingly,
unused zone flag field.  The flag is read without the lock held to do
the wakeup.  It's possible that the flag setting context is not the same
as the flag clearing context or for small races to occur.  However, each
race possibility is harmless and there is no visible degredation in
fragmentation treatment.

While zone->flag could have continued to be unused, there is potential
for moving some existing fields into the flags field instead.
Particularly read-mostly ones like zone->initialized and
zone->contiguous.

Link: http://lkml.kernel.org/r/20190103225712.GJ31517@techsingularity.net
Fixes: 1c30844d2d ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Reported-by: syzbot+93d94a001cfbce9e60e1@syzkaller.appspotmail.com
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Mike Kravetz
ddeaab32a8 hugetlbfs: revert "use i_mmap_rwsem for more pmd sharing synchronization"
This reverts b43a999005

The reverted commit caused issues with migration and poisoning of anon
huge pages.  The LTP move_pages12 test will cause an "unable to handle
kernel NULL pointer" BUG would occur with stack similar to:

  RIP: 0010:down_write+0x1b/0x40
  Call Trace:
    migrate_pages+0x81f/0xb90
    __ia32_compat_sys_migrate_pages+0x190/0x190
    do_move_pages_to_node.isra.53.part.54+0x2a/0x50
    kernel_move_pages+0x566/0x7b0
    __x64_sys_move_pages+0x24/0x30
    do_syscall_64+0x5b/0x180
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

The purpose of the reverted patch was to fix some long existing races
with huge pmd sharing.  It used i_mmap_rwsem for this purpose with the
idea that this could also be used to address truncate/page fault races
with another patch.  Further analysis has determined that i_mmap_rwsem
can not be used to address all these hugetlbfs synchronization issues.
Therefore, revert this patch while working an another approach to the
underlying issues.

Link: http://lkml.kernel.org/r/20190103235452.29335-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Mike Kravetz
e7c5809779 hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race"
This reverts c86aa7bbfd

The reverted commit caused ABBA deadlocks when file migration raced with
file eviction for specific hugetlbfs files.  This was discovered with a
modified version of the LTP move_pages12 test.

The purpose of the reverted patch was to close a long existing race
between hugetlbfs file truncation and page faults.  After more analysis
of the patch and impacted code, it was determined that i_mmap_rwsem can
not be used for all required synchronization.  Therefore, revert this
patch while working an another approach to the underlying issue.

Link: http://lkml.kernel.org/r/20190103235452.29335-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Jan Stancek
8ab88c7169 mm: page_mapped: don't assume compound page is huge or THP
LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
    page_mapped+0x78/0xb4
    stable_page_flags+0x27c/0x338
    kpageflags_read+0xfc/0x164
    proc_reg_read+0x7c/0xb8
    __vfs_read+0x58/0x178
    vfs_read+0x90/0x14c
    SyS_read+0x60/0xc0

The issue is that page_mapped() assumes that if compound page is not
huge, then it must be THP.  But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:

        for (i = 0; i < hpage_nr_pages(page); i++) {
                if (atomic_read(&page[i]._mapcount) >= 0)
                        return true;
	}

I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
 - allocates compound page (PAGEC) of order 1
 - allocates 2 normal pages (COPY), which are initialized to 0xff (to
   satisfy _mapcount >= 0)
 - 2 PAGEC page structs are copied to address of first COPY page
 - second page of COPY is marked as not present
 - call to page_mapped(COPY) now triggers fault on access to 2nd COPY
   page at offset 0x30 (_mapcount)

[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c

Fix the loop to iterate for "1 << compound_order" pages.

Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".

Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
Fixes: e1534ae950 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Matthew Wilcox
1ed7293ac4 mm/memory.c: initialise mmu_notifier_range correctly
One of the paths in follow_pte_pmd() initialised the mmu_notifier_range
incorrectly.

Link: http://lkml.kernel.org/r/20190103002126.GM6310@bombadil.infradead.org
Fixes: ac46d4f3c4 ("mm/mmu_notifier: use structure for invalidate_range_start/end calls v2")
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Miles Chen
aff876dcf4 tools/vm/page_owner: use page_owner_sort in the use example
The example in comment does not useable because the output binary is
named "page_owner_sort", not "sort".

Also add a reference to Documentation/vm/page_owner.rst

Link: http://lkml.kernel.org/r/1546515361-8317-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Andrey Konovalov
a3fe7cdf02 kasan: fix krealloc handling for tag-based mode
Right now tag-based KASAN can retag the memory that is reallocated via
krealloc and return a differently tagged pointer even if the same slab
object gets used and no reallocated technically happens.

There are a few issues with this approach.  One is that krealloc callers
can't rely on comparing the return value with the passed argument to
check whether reallocation happened.  Another is that if a caller knows
that no reallocation happened, that it can access object memory through
the old pointer, which leads to false positives.  Look at
nf_ct_ext_add() to see an example.

Fix this by keeping the same tag if the memory don't actually gets
reallocated during krealloc.

Link: http://lkml.kernel.org/r/bb2a71d17ed072bcc528cbee46fcbd71a6da3be4.1546540962.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Andrey Konovalov
96fedce27e kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY
With CONFIG_HARDENED_USERCOPY enabled __check_heap_object() compares and
then subtracts a potentially tagged pointer with a non-tagged address of
the page that this pointer belongs to, which leads to unexpected
behavior.

Untag the pointer in __check_heap_object() before doing any of these
operations.

Link: http://lkml.kernel.org/r/7e756a298d514c4482f52aea6151db34818d395d.1546540962.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Andrey Konovalov
eb214f2dda kasan, arm64: use ARCH_SLAB_MINALIGN instead of manual aligning
Instead of changing cache->align to be aligned to KASAN_SHADOW_SCALE_SIZE
in kasan_cache_create() we can reuse the ARCH_SLAB_MINALIGN macro.

Link: http://lkml.kernel.org/r/52ddd881916bcc153a9924c154daacde78522227.1546540962.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Michal Hocko
63f3655f95 mm, memcg: fix reclaim deadlock with writeback
Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the
ext4 writeback

  task1:
    wait_on_page_bit+0x82/0xa0
    shrink_page_list+0x907/0x960
    shrink_inactive_list+0x2c7/0x680
    shrink_node_memcg+0x404/0x830
    shrink_node+0xd8/0x300
    do_try_to_free_pages+0x10d/0x330
    try_to_free_mem_cgroup_pages+0xd5/0x1b0
    try_charge+0x14d/0x720
    memcg_kmem_charge_memcg+0x3c/0xa0
    memcg_kmem_charge+0x7e/0xd0
    __alloc_pages_nodemask+0x178/0x260
    alloc_pages_current+0x95/0x140
    pte_alloc_one+0x17/0x40
    __pte_alloc+0x1e/0x110
    alloc_set_pte+0x5fe/0xc20
    do_fault+0x103/0x970
    handle_mm_fault+0x61e/0xd10
    __do_page_fault+0x252/0x4d0
    do_page_fault+0x30/0x80
    page_fault+0x28/0x30

  task2:
    __lock_page+0x86/0xa0
    mpage_prepare_extent_to_map+0x2e7/0x310 [ext4]
    ext4_writepages+0x479/0xd60
    do_writepages+0x1e/0x30
    __writeback_single_inode+0x45/0x320
    writeback_sb_inodes+0x272/0x600
    __writeback_inodes_wb+0x92/0xc0
    wb_writeback+0x268/0x300
    wb_workfn+0xb4/0x390
    process_one_work+0x189/0x420
    worker_thread+0x4e/0x4b0
    kthread+0xe6/0x100
    ret_from_fork+0x41/0x50

He adds
 "task1 is waiting for the PageWriteback bit of the page that task2 has
  collected in mpd->io_submit->io_bio, and tasks2 is waiting for the
  LOCKED bit the page which tasks1 has locked"

More precisely task1 is handling a page fault and it has a page locked
while it charges a new page table to a memcg.  That in turn hits a
memory limit reclaim and the memcg reclaim for legacy controller is
waiting on the writeback but that is never going to finish because the
writeback itself is waiting for the page locked in the #PF path.  So
this is essentially ABBA deadlock:

                                        lock_page(A)
                                        SetPageWriteback(A)
                                        unlock_page(A)
  lock_page(B)
                                        lock_page(B)
  pte_alloc_pne
    shrink_page_list
      wait_on_page_writeback(A)
                                        SetPageWriteback(B)
                                        unlock_page(B)

                                        # flush A, B to clear the writeback

This accumulating of more pages to flush is used by several filesystems
to generate a more optimal IO patterns.

Waiting for the writeback in legacy memcg controller is a workaround for
pre-mature OOM killer invocations because there is no dirty IO
throttling available for the controller.  There is no easy way around
that unfortunately.  Therefore fix this specific issue by pre-allocating
the page table outside of the page lock.  We have that handy
infrastructure for that already so simply reuse the fault-around pattern
which already does this.

There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations
from under a fs page locked but they should be really rare.  I am not
aware of a better solution unfortunately.

[akpm@linux-foundation.org: fix mm/memory.c:__do_fault()]
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: enhance comment, per Johannes]
  Link: http://lkml.kernel.org/r/20181214084948.GA5624@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20181213092221.27270-1-mhocko@kernel.org
Fixes: c3b94f44fc ("memcg: further prevent OOM with too many dirty pages")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Liu Bo <bo.liu@linux.alibaba.com>
Debugged-by: Liu Bo <bo.liu@linux.alibaba.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Qian Cai
7bff3c0699 mm/usercopy.c: no check page span for stack objects
It is easy to trigger this with CONFIG_HARDENED_USERCOPY_PAGESPAN=y,

  usercopy: Kernel memory overwrite attempt detected to spans multiple pages (offset 0, size 23)!
  kernel BUG at mm/usercopy.c:102!

For example,

print_worker_info
char name[WQ_NAME_LEN] = { };
char desc[WORKER_DESC_LEN] = { };
  probe_kernel_read(name, wq->name, sizeof(name) - 1);
  probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
    __copy_from_user_inatomic
      check_object_size
        check_heap_object
          check_page_span

This is because on-stack variables could cross PAGE_SIZE boundary, and
failed this check,

if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
	   ((unsigned long)end & (unsigned long)PAGE_MASK)))

ptr = FFFF889007D7EFF8
end = FFFF889007D7F00E

Hence, fix it by checking if it is a stack object first.

[keescook@chromium.org: improve comments after reorder]
  Link: http://lkml.kernel.org/r/20190103165151.GA32845@beast
Link: http://lkml.kernel.org/r/20181231030254.99441-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Christoph Lameter
09c2e76ed7 slab: alien caches must not be initialized if the allocation of the alien cache failed
Callers of __alloc_alien() check for NULL.  We must do the same check in
__alloc_alien_cache to avoid NULL pointer dereferences on allocation
failures.

Link: http://lkml.kernel.org/r/010001680f42f192-82b4e12e-1565-4ee0-ae1f-1e98974906aa-000000@email.amazonses.com
Fixes: 49dfc304ba ("slab: use the lock on alien_cache, instead of the lock on array_cache")
Fixes: c8522a3a58 ("Slab: introduce alloc_alien")
Signed-off-by: Christoph Lameter <cl@linux.com>
Reported-by: syzbot+d6ed4ec679652b4fd4e4@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Shakeel Butt
ba4a45746c fork, memcg: fix cached_stacks case
Commit 5eed6f1dff ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") fixes a crash caused due to failed memcg charge of
the kernel stack.  However the fix misses the cached_stacks case which
this patch fixes.  So, the same crash can happen if the memcg charge of
a cached stack is failed.

Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com
Fixes: 5eed6f1dff ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:11 -08:00
Minchan Kim
1d69a3f8ae zram: idle writeback fixes and cleanup
This patch includes some fixes and cleanup for idle-page writeback.

1. writeback_limit interface

Now writeback_limit interface is rather conusing.  For example, once
writeback limit budget is exausted, admin can see 0 from
/sys/block/zramX/writeback_limit which is same semantic with disable
writeback_limit at this moment.  IOW, admin cannot tell that zero came
from disable writeback limit or exausted writeback limit.

To make the interface clear, let's sepatate enable of writeback limit to
another knob - /sys/block/zram0/writeback_limit_enable

* before:
  while true :
    # to re-enable writeback limit once previous one is used up
    echo 0 > /sys/block/zram0/writeback_limit
    echo $((200<<20)) > /sys/block/zram0/writeback_limit
    ..
    .. # used up the writeback limit budget

* new
  # To enable writeback limit, from the beginning, admin should
  # enable it.
  echo $((200<<20)) > /sys/block/zram0/writeback_limit
  echo 1 > /sys/block/zram/0/writeback_limit_enable
  while true :
    echo $((200<<20)) > /sys/block/zram0/writeback_limit
    ..
    .. # used up the writeback limit budget

It's much strightforward.

2. fix condition check idle/huge writeback mode check

The mode in writeback_store is not bit opeartion any more so no need to
use bit operations.  Furthermore, current condition check is broken in
that it does writeback every pages regardless of huge/idle.

3. clean up idle_store

No need to use goto.

[minchan@kernel.org: missed spin_lock_init]
  Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com
Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Suggested-by: John Dias <joaodias@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: John Dias <joaodias@google.com>
Cc: Srinivas Paladugu <srnvs@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 17:15:10 -08:00
Jerome Brunet
19a220dd1e arm64: defconfig: enable modules for amlogic s400 sound card
Compile the necessary drivers as modules, including codecs, for the
s400 sound card.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2019-01-08 15:27:50 -08:00
Lyude Paul
c235316d93 drm/dp_mst: Add __must_check to drm_dp_mst_topology_mgr_resume()
Since I've had to fix two cases of drivers not checking the return code
from this function, let's make the compiler complain so this doesn't
come up again in the future.

Changes since v1:
* Remove unneeded __must_check in function declaration - danvet

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190108211133.32564-4-lyude@redhat.com
2019-01-08 17:58:27 -05:00
Lyude Paul
2d1af6a11c drm/amdgpu: Don't fail resume process if resuming atomic state fails
This is an ugly one unfortunately. Currently, all DRM drivers supporting
atomic modesetting will save the state that userspace had set before
suspending, then attempt to restore that state on resume. This probably
worked very well at one point, like many other things, until DP MST came
into the picture. While it's easy to restore state on normal display
connectors that were disconnected during suspend regardless of their
state post-resume, this can't really be done with MST because of the
fact that setting up a downstream sink requires performing sideband
transactions between the source and the MST hub, sending out the ACT
packets, etc.

Because of this, there isn't really a guarantee that we can restore the
atomic state we had before suspend once we've resumed. This sucks pretty
bad, but so far I haven't run into any compositors that this actually
causes serious issues with. Most compositors will notice the hotplug we
send afterwards, and then reprobe state.

Since nouveau and i915 also don't fail the suspend/resume process due to
failing to restore the atomic state, let's make amdgpu match this
behavior. Better to resume the GPU properly, then to stop the process
half way because of a potentially unavoidable atomic commit failure.

Eventually, we'll have a real fix for this problem on the DRM level. But
we've got some more important low-hanging fruit to deal with first.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: <stable@vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20190108211133.32564-3-lyude@redhat.com
2019-01-08 17:58:26 -05:00
Lyude Paul
fe7553bef8 drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
drm_dp_mst_topology_mgr_resume() returns whether or not it managed to
find the topology in question after a suspend resume cycle, and the
driver is supposed to check this value and disable MST accordingly if
it's gone-in addition to sending a hotplug in order to notify userspace
that something changed during suspend.

Currently, amdgpu just makes the mistake of ignoring the return code
from drm_dp_mst_topology_mgr_resume() which means that if a topology was
removed in suspend, amdgpu never notices and assumes it's still
connected which leads to all sorts of problems.

So, fix this by actually checking the rc from
drm_dp_mst_topology_mgr_resume(). Also, reformat the rest of the
function while we're at it to fix the over-indenting.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: <stable@vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20190108211133.32564-2-lyude@redhat.com
2019-01-08 17:58:24 -05:00
Amadeusz Sławiński
f5c9571e22 ALSA: usb-audio: fix CM6206 register definitions
fix typo after a recent commit causing headphones to have no sound

Fixes: ad43d528a7 (ALSA: usb-audio: Define registers for CM6206)
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-01-08 22:51:44 +01:00
Yu Zhao
c4a32b266d drm/amdgpu: validate user GEM object size
When creating frame buffer, userspace may request to attach to a
previously allocated GEM object that is smaller than what GPU
requires. Validation must be done to prevent out-of-bound DMA,
otherwise it could be exploited to reveal sensitive data.

This fix is not done in a common code path because individual
driver might have different requirement.

Cc: stable@vger.kernel.org # v4.2+
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-08 16:26:50 -05:00
Yu Zhao
89f23b6efe drm/amdgpu: validate user pitch alignment
Userspace may request pitch alignment that is not supported by GPU.
Some requests 32, but GPU ignores it and uses default 64 when cpp is
4. If GEM object is allocated based on the smaller alignment, GPU
DMA will go out of bound.

Cc: stable@vger.kernel.org # v4.2+
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-08 16:26:41 -05:00
Evan Quan
fadcb8f9fc drm/amd/powerplay: drop the unnecessary uclk hard min setting
Since soft min setting is enough. Hard min setting is redundant.

Reported-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-08 16:26:34 -05:00
Evan Quan
fff0d3f768 drm/amd/powerplay: avoid possible buffer overflow
Make sure the clock level enforced is within the allowed
ranges.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-08 16:26:26 -05:00
Evan Quan
0624e145fb drm/amd/powerplay: create pp_od_clk_voltage device file under OD support
Since pp_od_clk_voltage device file is for OD related sysfs operations.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-08 16:26:18 -05:00
Evan Quan
8139d493da drm/amd/powerplay: update OD support flag for SKU with no OD capabilities
For those ASICs with no overdrive capabilities, the OD support flag
will be reset.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-08 16:26:04 -05:00
David Herrmann
7b55851367 fork: record start_time late
This changes the fork(2) syscall to record the process start_time after
initializing the basic task structure but still before making the new
process visible to user-space.

Technically, we could record the start_time anytime during fork(2).  But
this might lead to scenarios where a start_time is recorded long before
a process becomes visible to user-space.  For instance, with
userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
for an indefinite amount of time (and will, if this causes network
access, or similar).

By recording the start_time late, it much closer reflects the point in
time where the process becomes live and can be observed by other
processes.

Lastly, this makes it much harder for user-space to predict and control
the start_time they get assigned.  Previously, user-space could fork a
process and stall it in copy_thread_tls() before its pid is allocated,
but after its start_time is recorded.  This can be misused to later-on
cycle through PIDs and resume the stalled fork(2) yielding a process
that has the same pid and start_time as a process that existed before.
This can be used to circumvent security systems that identify processes
by their pid+start_time combination.

Even though user-space was always aware that start_time recording is
flaky (but several projects are known to still rely on start_time-based
identification), changing the start_time to be recorded late will help
mitigate existing attacks and make it much harder for user-space to
control the start_time a process gets assigned.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-08 09:40:53 -08:00
Arnaldo Carvalho de Melo
ee412f1469 tools include uapi: Sync linux/vhost.h with the kernel sources
To get the changes in:

  4b86713236 ("vhost: split structs into a separate header file")

Silencing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
  diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h

Those didn't touch things used in tools, i.e. the following continues
working:

  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh
  static const char *vhost_virtio_ioctl_cmds[] = {
          [0x00] = "SET_FEATURES",
          [0x01] = "SET_OWNER",
          [0x02] = "RESET_OWNER",
          [0x03] = "SET_MEM_TABLE",
          [0x04] = "SET_LOG_BASE",
          [0x07] = "SET_LOG_FD",
          [0x10] = "SET_VRING_NUM",
          [0x11] = "SET_VRING_ADDR",
          [0x12] = "SET_VRING_BASE",
          [0x13] = "SET_VRING_ENDIAN",
          [0x14] = "GET_VRING_ENDIAN",
          [0x20] = "SET_VRING_KICK",
          [0x21] = "SET_VRING_CALL",
          [0x22] = "SET_VRING_ERR",
          [0x23] = "SET_VRING_BUSYLOOP_TIMEOUT",
          [0x24] = "GET_VRING_BUSYLOOP_TIMEOUT",
          [0x25] = "SET_BACKEND_FEATURES",
          [0x30] = "NET_SET_BACKEND",
          [0x40] = "SCSI_SET_ENDPOINT",
          [0x41] = "SCSI_CLEAR_ENDPOINT",
          [0x42] = "SCSI_GET_ABI_VERSION",
          [0x43] = "SCSI_SET_EVENTS_MISSED",
          [0x44] = "SCSI_GET_EVENTS_MISSED",
          [0x60] = "VSOCK_SET_GUEST_CID",
          [0x61] = "VSOCK_SET_RUNNING",
  };
  static const char *vhost_virtio_ioctl_read_cmds[] = {
          [0x00] = "GET_FEATURES",
          [0x12] = "GET_VRING_BASE",
          [0x26] = "GET_BACKEND_FEATURES",
  };
  $

At some point in the eBPFication of perf, using something like:

	# perf trace -e ioctl(cmd=VHOST_VRING*)

Will setup a BPF filter right at the raw_syscalls:sys_enter tracepoint,
i.e. filtering at the origin.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-g28usrt7l59lwq3wuh8vzbig@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 14:09:33 -03:00
Arnaldo Carvalho de Melo
fdc42ca190 tools include uapi: Sync linux/fs.h copy with the kernel sources
To get the changes in:

  e262e32d6b ("vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled")

That made the mount flags string table generator to switch to using
mount.h instead.

This silences the following perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
  diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mosz81pa6iwxko4p2owbm3ss@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 14:09:33 -03:00
Arnaldo Carvalho de Melo
1c23397d2a perf beauty: Switch from using uapi/linux/fs.h to uapi/linux/mount.h
As now we'll update our fs.h copy and what tools/perf/trace/beauty/mount_flags.sh
needs just got moved to mount.h, use that instead.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ls19h376xukeouxrw9dswkcn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 14:09:33 -03:00
Arnaldo Carvalho de Melo
250bfc87dd tools include uapi: Grab a copy of linux/mount.h
We were using a copy of uapi/linux/fs.h to create the mount syscall
'flags' string table to use in 'perf trace', to convert from the number
obtained via the raw_syscalls:sys_enter into a string, using
tools/perf/trace/beauty/mount_flags.sh, but in e262e32d6b ("vfs:
Suppress MS_* flag defs within the kernel unless explicitly enabled")
those defines got moved to linux/mount.h, so grab a copy of mount.h too.

Keep the uapi/linux/fs.h as we'll use it for the SEEK_ constants.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-i2ricmpwpdrpukfq3298jr1z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 14:09:28 -03:00
Alex Williamson
58fec830fc vfio/type1: Fix unmap overflow off-by-one
The below referenced commit adds a test for integer overflow, but in
doing so prevents the unmap ioctl from ever including the last page of
the address space.  Subtract one to compare to the last address of the
unmap to avoid the overflow and wrap-around.

Fixes: 71a7d3d78e ("vfio/type1: silence integer overflow warning")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
Cc: stable@vger.kernel.org # v4.15+
Reported-by: Pei Zhang <pezhang@redhat.com>
Debugged-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-01-08 09:31:28 -07:00
Arnaldo Carvalho de Melo
f2e14cd2c9 perf top: Lift restriction on using callchains without "sym" in --sort
This restriction is not present in 'perf report' and since 'perf top'
uses the same hists browser, remove it from it as well.

With this we create per event buckets with callchain trees, so that

  # perf top --sort dso -g --no-children

Bucketizes samples by DSO and below it shows the callchains leading to
functions in this DSO.

Try also:

  # perf top -e sched:*switch -g --no-children

To see the callchains leading to sched switches, pressing 'E' to expand
all one can quickly see the most common scheduler switches and what
leads to them, for instance, calls to IO, futexes, etc.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lkml.kernel.org/r/20190107140854.GA28965@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
9231967e2f tools lib traceevent: Remove tep_data_event_from_type() API
In order to make libtraceevent into a proper library, its API
should be straightforward.

After discussion with Steven Rostedt, we decided to remove the
tep_data_event_from_type() API and to replace it with tep_find_event(),
as it does the same.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201040852.913841066@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
4104e60427 tools lib traceevent: Rename tep_is_file_bigendian() to tep_file_bigendian()
In order to make libtraceevent into a proper library, its API
should be straightforward.

After a discussion with Steven Rostedt, we decided to rename a few APIs,
to have more intuitive names.

This patch renames tep_is_file_bigendian() to tep_file_bigendian().

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201040852.767549746@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
f87ce7c43f tools lib traceevent: Changed return logic of tep_register_event_handler() API
In order to make libtraceevent into a proper library, its API
should be straightforward.

The tep_register_event_handler() functions returns -1 in case it
successfully registers the new event handler. Such return code is used
by the other library APIs in case of an error.

To unify the return logic of tep_register_event_handler() with the other
APIs, this patch introduces enum tep_reg_handler, which is used by this
function as return value, to handle all possible successful return
cases.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201040852.628034497@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
6d2d6fd7e3 tools lib traceevent: Changed return logic of trace_seq_printf() and trace_seq_vprintf() APIs
In order to make libtraceevent into a proper library, its API should be
straightforward.

The trace_seq_printf() and trace_seq_vprintf() APIs have inconsistent
returned values with the other trace_seq_* APIs.

This path changes the return logic of trace_seq_printf() and
trace_seq_vprintf() to return the number of printed characters, as the
other trace_seq_* related APIs.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201040852.485792891@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
2e4318a287 tools lib traceevent: Rename struct cmdline to struct tep_cmdline
In order to make libtraceevent a proper library, variables, data
structures and functions should have a unique prefix to prevent name
space conflicts. That prefix will be "tep_".

This patch renames 'struct cmdline' to 'struct tep_cmdline'.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201040852.358871851@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
eed14f4b07 tools lib traceevent: Initialize host_bigendian at tep_handle allocation
This patch initializes the host_bigendian member of the tep_handle
structure with the byte order of the current host, when this handler is
created - in tep_alloc() API. We need this in order to remove the
tep_set_host_bigendian() API.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201040852.216292134@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Tzvetomir Stoyanov
ca3958b1c0 tools lib traceevent: Introduce new libtracevent API: tep_override_comm()
This patch adds a new API of tracevent library: tep_override_comm() It
registers a pid / command mapping. If a mapping with the same pid
already exists, the entry is updated with the new command.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181130154648.038915912@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Florian Fainelli
21327c7843 perf tests: Add a test for the ARM 32-bit [vectors] page
perf on ARM requires CONFIG_KUSER_HELPERS to be turned on to allow some
independance with respect to the ARM CPU being used. Add a test which
tries to locate the [vectors] page, created when CONFIG_KUSER_HELPERS is
turned on to help asses the system's health.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lkml.kernel.org/r/20181221034337.26663-3-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Florian Fainelli
011532379b perf tools: Make find_vdso_map() more modular
In preparation for checking that the vectors page on the ARM
architecture, refactor the find_vdso_map() function to accept finding an
arbitrary string and create a dedicated helper function for that under
util/find-map.c and update the filename to find-map.c and all references
to it: perf-read-vdso.c and util/vdso.c.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lkml.kernel.org/r/20181221034337.26663-2-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Arnaldo Carvalho de Melo
ac6e022cbf perf trace: Fix alignment for [continued] lines
We were not taking into account the "... [continued]" printed
characters, fix it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qt20y0acmf8k0bzisce8kw95@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:12 -03:00
Arnaldo Carvalho de Melo
172bf02d56 perf trace: Fix ')' placement in "interrupted" syscall lines
When we get the sys_enter for a syscall we check if the last one is
still waiting for its matching sys_exit, if so we print this:

   468.753 (         ): firefox/32382 poll(ufds: 0x7f3988d3dd00, nfds: 7, timeout_msecs: 4294967295)     ...
   449.575 ( 0.004 ms): Softwar~cThrea/32434 futex(uaddr: 0x7f39a18a9b70, op: WAKE|PRIVATE_FLAG, val: 1)           = 0

At some point we'll get that poll sys_exit event and will print a "[continued]" line.

While making the sizing of the alignment after the syscall arg list and
its result configurable, so that we can mimic strace, which uses a
smaller alingment by default, a bug was introduced where the closing
parens appeared before the syscall name and its arg list, fix it.

Fixes: 4b8a240ed5 ("perf trace: Add alignment spaces after the closing parens")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oi45i54s59h1w1kmgpzrfuum@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:12 -03:00
Guo Ren
56752b2175 irqchip/csky: fixup handle_irq_perbit break irq
The handle_irq_perbit function loop every bit in hwirq local variable.

handle_irq_perbit(hwirq) {
  for_everyt_bit_in(hwirq) {
	handle_domain_irq()
		->irq_exit()
		->invoke_softirq()
		->__do_softirq()
		->local_irq_enable() // Here will cause new interrupt.
  }
}

When new interrupt coming at local_irq_enable, it will finish another
interrupt handler and pull down the interrupt source. But hwirq is the
local variable for handle_irq_perbit(), it can't get new interrupt
controller pending reg status. So we need update hwirq with pending reg
in every loop.

Also change write_relax to writel could prevent stw from fast retire.
When local_irq is enabled, intc regs is really set-in.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Cc: Lu Baoquan <lu.baoquan@intellif.com>
2019-01-09 00:18:46 +08:00