Commit graph

1135997 commits

Author SHA1 Message Date
Palmer Dabbelt
b214fadff2 MAINTAINERS: git://github.com -> https://github.com for nilfs2
Github deprecated the git:// links about a year ago, so let's move to the
https:// URLs instead.

Link: https://lkml.kernel.org/r/20221020024255.5000-1-konishi.ryusuke@gmail.com
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Link: https://lkml.kernel.org/r/20221013214638.30933-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:22 -07:00
Waiman Long
984a608377 mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
Commit 6edda04ccc ("mm/kmemleak: prevent soft lockup in first object
iteration loop of kmemleak_scan()") adds cond_resched() in the first
object iteration loop of kmemleak_scan().  However, it turns that the 2nd
objection iteration loop can still cause soft lockup to happen in some
cases.  So add a cond_resched() call in the 2nd and 3rd loops as well to
prevent that and for completeness.

Link: https://lkml.kernel.org/r/20221020175619.366317-1-longman@redhat.com
Fixes: 6edda04ccc ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:22 -07:00
Phillip Lougher
e11c4e088b squashfs: fix buffer release race condition in readahead code
Fix a buffer release race condition, where the error value was used after
release.

Link: https://lkml.kernel.org/r/20221020223616.7571-4-phillip@squashfs.org.uk
Fixes: b09a7a036d ("squashfs: support reading fragments in readahead call")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Marc Miltenberger <marcmiltenberger@gmail.com>
Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Slade Watkins <srw@sladewatkins.net>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:21 -07:00
Phillip Lougher
c9199de82b squashfs: fix extending readahead beyond end of file
The readahead code will try to extend readahead to the entire size of the
Squashfs data block.

But, it didn't take into account that the last block at the end of the
file may not be a whole block.  In this case, the code would extend
readahead to beyond the end of the file, leaving trailing pages.

Fix this by only requesting the expected number of pages.

Link: https://lkml.kernel.org/r/20221020223616.7571-3-phillip@squashfs.org.uk
Fixes: 8fc78b6fe2 ("squashfs: implement readahead")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Marc Miltenberger <marcmiltenberger@gmail.com>
Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Slade Watkins <srw@sladewatkins.net>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:21 -07:00
Phillip Lougher
9ef8eb6104 squashfs: fix read regression introduced in readahead code
Patch series "squashfs: fix some regressions introduced in the readahead
code".

This patchset fixes 3 regressions introduced by the recent readahead code
changes.  The first regression is causing "snaps" to randomly fail after a
couple of hours or days, which how the regression came to light.


This patch (of 3):

If a file isn't a whole multiple of the page size, the last page will have
trailing bytes unfilled.

There was a mistake in the readahead code which did this.  In particular
it incorrectly assumed that the last page in the readahead page array
(page[nr_pages - 1]) will always contain the last page in the block, which
if we're at file end, will be the page that needs to be zero filled.

But the readahead code may not return the last page in the block, which
means it is unmapped and will be skipped by the decompressors (a temporary
buffer used).

In this case the zero filling code will zero out the wrong page, leading
to data corruption.

Fix this by by extending the "page actor" to return the last page if
present, or NULL if a temporary buffer was used.

Link: https://lkml.kernel.org/r/20221020223616.7571-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20221020223616.7571-2-phillip@squashfs.org.uk
Fixes: 8fc78b6fe2 ("squashfs: implement readahead")
Link: https://lore.kernel.org/lkml/b0c258c3-6dcf-aade-efc4-d62a8b3a1ce2@alu.unizg.hr/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Slade Watkins <srw@sladewatkins.net>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Marc Miltenberger <marcmiltenberger@gmail.com>
Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:21 -07:00
Alistair Popple
97061d4411 nouveau: fix migrate_to_ram() for faulting page
Commit 16ce101db8 ("mm/memory.c: fix race when faulting a device private
page") changed the migrate_to_ram() callback to take a reference on the
device page to ensure it can't be freed while handling the fault. 
Unfortunately the corresponding update to Nouveau to accommodate this
change was inadvertently dropped from that patch causing GPU to CPU
migration to fail so add it here.

Link: https://lkml.kernel.org/r/20221019122934.866205-1-apopple@nvidia.com
Fixes: 16ce101db8 ("mm/memory.c: fix race when faulting a device private page")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:24 -07:00
Mel Gorman
71e2d666ef mm/huge_memory: do not clobber swp_entry_t during THP split
The following has been observed when running stressng mmap since commit
b653db7735 ("mm: Clear page->private when splitting or migrating a page")

   watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
   CPU: 75 PID: 9546 Comm: stress-ng Tainted: G            E      6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
   Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
   RIP: 0010:xas_descend+0x28/0x80
   Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
   RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
   RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
   RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
   RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
   R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
   R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
   FS:  00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
    <TASK>
    xas_load+0x3a/0x50
    __filemap_get_folio+0x80/0x370
    ? put_swap_page+0x163/0x360
    pagecache_get_page+0x13/0x90
    __try_to_reclaim_swap+0x50/0x190
    scan_swap_map_slots+0x31e/0x670
    get_swap_pages+0x226/0x3c0
    folio_alloc_swap+0x1cc/0x240
    add_to_swap+0x14/0x70
    shrink_page_list+0x968/0xbc0
    reclaim_page_list+0x70/0xf0
    reclaim_pages+0xdd/0x120
    madvise_cold_or_pageout_pte_range+0x814/0xf30
    walk_pgd_range+0x637/0xa30
    __walk_page_range+0x142/0x170
    walk_page_range+0x146/0x170
    madvise_pageout+0xb7/0x280
    ? asm_common_interrupt+0x22/0x40
    madvise_vma_behavior+0x3b7/0xac0
    ? find_vma+0x4a/0x70
    ? find_vma+0x64/0x70
    ? madvise_vma_anon_name+0x40/0x40
    madvise_walk_vmas+0xa6/0x130
    do_madvise+0x2f4/0x360
    __x64_sys_madvise+0x26/0x30
    do_syscall_64+0x5b/0x80
    ? do_syscall_64+0x67/0x80
    ? syscall_exit_to_user_mode+0x17/0x40
    ? do_syscall_64+0x67/0x80
    ? syscall_exit_to_user_mode+0x17/0x40
    ? do_syscall_64+0x67/0x80
    ? do_syscall_64+0x67/0x80
    ? common_interrupt+0x8b/0xa0
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

The problem can be reproduced with the mmtests config
config-workload-stressng-mmap.  It does not always happen and when it
triggers is variable but it has happened on multiple machines.

The intent of commit b653db7735 patch was to avoid the case where
PG_private is clear but folio->private is not-NULL.  However, THP tail
pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
stated in the documentation for struct folio.  This patch only clobbers
page->private for tail pages if the head page was not in swapcache and
warns once if page->private had an unexpected value.

Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net
Fixes: b653db7735 ("mm: Clear page->private when splitting or migrating a page")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:24 -07:00
Mike Kravetz
612b8a3170 hugetlb: fix memory leak associated with vma_lock structure
The hugetlb vma_lock structure hangs off the vm_private_data pointer of
sharable hugetlb vmas.  The structure is vma specific and can not be
shared between vmas.  At fork and various other times, vmas are duplicated
via vm_area_dup().  When this happens, the pointer in the newly created
vma must be cleared and the structure reallocated.  Two hugetlb specific
routines deal with this hugetlb_dup_vma_private and hugetlb_vm_op_open. 
Both routines are called for newly created vmas.  hugetlb_dup_vma_private
would always clear the pointer and hugetlb_vm_op_open would allocate the
new vms_lock structure.  This did not work in the case of this calling
sequence pointed out in [1].

  move_vma
    copy_vma
      new_vma = vm_area_dup(vma);
      new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock.
    is_vm_hugetlb_page(vma)
      clear_vma_resv_huge_pages
        hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL

When clearing hugetlb_dup_vma_private we actually leak the associated
vma_lock structure.

The vma_lock structure contains a pointer to the associated vma.  This
information can be used in hugetlb_dup_vma_private and hugetlb_vm_op_open
to ensure we only clear the vm_private_data of newly created (copied)
vmas.  In such cases, the vma->vma_lock->vma field will not point to the
vma.

Update hugetlb_dup_vma_private and hugetlb_vm_op_open to not clear
vm_private_data if vma->vma_lock->vma == vma.  Also, log a warning if
hugetlb_vm_op_open ever encounters the case where vma_lock has already
been correctly allocated for the vma.

[1] https://lore.kernel.org/linux-mm/5154292a-4c55-28cd-0935-82441e512fc3@huawei.com/

Link: https://lkml.kernel.org/r/20221019201957.34607-1-mike.kravetz@oracle.com
Fixes: 131a79b474 ("hugetlb: fix vma lock handling during split vma and range unmapping")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:23 -07:00
Liam R. Howlett
df48a5f7a3 mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
Try to avoid using the left over split page on the next request for a page
by calling __free_pages_ok() with FPI_TO_TAIL.  This increases the
potential of defragmenting memory when it's used for a short period of
time.

Link: https://lkml.kernel.org/r/20220531185626.yvlmymbxyoe5vags@revolver
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:23 -07:00
Hugh Dickins
08ac85521c mm: /proc/pid/smaps_rollup: fix maple tree search
/proc/pid/smaps_rollup showed 0 kB for everything: now find first vma.

Link: https://lkml.kernel.org/r/3011bee7-182-97a2-1083-d5f5b688e54b@google.com
Fixes: c4c84f0628 ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:23 -07:00
Rik van Riel
12df140f0b mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
The h->*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.

This could lead to a corrupted value of h->resv_huge_pages, which we have
observed on our systems.

Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
potential race.

Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
Fixes: a88c769548 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Glen McCready <gkmccready@meta.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:23 -07:00
Liam Howlett
a57b70519d mm/mmap: fix MAP_FIXED address return on VMA merge
mmap should return the start address of newly mapped area when successful.
On a successful merge of a VMA, the return address was changed and thus
was violating that expectation from userspace.

This is a restoration of functionality provided by 309d08d9b3
(mm/mmap.c: fix mmap return value when vma is merged after call_mmap()). 
For completeness of fixing MAP_FIXED, implement the comments from the
previous discussion to never update the address and fail if the address
changes.  Leaving the error as a WARN_ON() to avoid crashing the kernel.

Link: https://lkml.kernel.org/r/20221018191613.4133459-1-Liam.Howlett@oracle.com
Link: https://lore.kernel.org/all/Y06yk66SKxlrwwfb@lakrids/
Link: https://lore.kernel.org/all/20201203085350.22624-1-liuzixian4@huawei.com/
Fixes: 4dd1b84140 ("mm/mmap: use advanced maple tree API for mmap_region()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Cc: Liu Zixian <liuzixian4@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:23 -07:00
Andrew Morton
1cd916d034 mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
The code is OK, but it fools gcc.

mm/mmap.c:802 __vma_adjust() error: uninitialized symbol 'next_next'.

Fixes: 524e00b36e ("mm: remove rb tree.")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:23 -07:00
Mike Kravetz
5789151e48 mm/mmap: undo ->mmap() when mas_preallocate() fails
A memory leak in hugetlb_reserve_pages was reported in [1].  The root
cause was traced to an error path in mmap_region when mas_preallocate()
fails.  In this case, the vma is freed after a successful call to
filesystem specific mmap.  The hugetlbfs mmap routine may allocate data
structures pointed to by m_private_data.  These need to be cleaned up by
the hugetlb vm_ops->close() routine.

The same issue was addressed by commit deb0f65628 ("mm/mmap: undo
->mmap() when arch_validate_flags() fails") for the arch_validate_flags()
test.  Go to the same close_and_free_vma label if mas_preallocate() fails.

[1] https://lore.kernel.org/linux-mm/CAKXUXMxf7OiCwbxib7MwfR4M1b5+b3cNTU7n5NV9Zm4967=FPQ@mail.gmail.com/

Link: https://lkml.kernel.org/r/20221018024945.415036-1-mike.kravetz@oracle.com
Fixes: d4af56c5c7 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:22 -07:00
Colin Ian King
eacf96d23f init: Kconfig: fix spelling mistake "satify" -> "satisfy"
There is a spelling mistake in a Kconfig description.  Fix it.

Link: https://lkml.kernel.org/r/20221007204339.2757753-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:22 -07:00
Joseph Qi
28f4821b1b ocfs2: clear dinode links count in case of error
In ocfs2_mknod(), if error occurs after dinode successfully allocated,
ocfs2 i_links_count will not be 0.

So even though we clear inode i_nlink before iput in error handling, it
still won't wipe inode since we'll refresh inode from dinode during inode
lock.  So just like clear inode i_nlink, we clear ocfs2 i_links_count as
well.  Also do the same change for ocfs2_symlink().

Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:22 -07:00
Joseph Qi
759a7c6126 ocfs2: fix BUG when iput after ocfs2_mknod fails
Commit b1529a41f7 "ocfs2: should reclaim the inode if
'__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed
inode if __ocfs2_mknod_locked() fails later.  But this introduce a race,
the freed bit may be reused immediately by another thread, which will
update dinode, e.g.  i_generation.  Then iput this inode will lead to BUG:
inode->i_generation != le32_to_cpu(fe->i_generation)

We could make this inode as bad, but we did want to do operations like
wipe in some cases.  Since the claimed inode bit can only affect that an
dinode is missing and will return back after fsck, it seems not a big
problem.  So just leave it as is by revert the reclaim logic.

Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com
Fixes: b1529a41f7 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:22 -07:00
Martin Liska
977ef30a7d gcov: support GCC 12.1 and newer compilers
Starting with GCC 12.1, the created .gcda format can't be read by gcov
tool.  There are 2 significant changes to the .gcda file format that
need to be supported:

a) [gcov: Use system IO buffering]
   (23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05) changed that all sizes in
   the format are in bytes and not in words (4B)

b) [gcov: make profile merging smarter]
   (72e0c742bd01f8e7e6dcca64042b9ad7e75979de) add a new checksum to the
   file header.

Tested with GCC 7.5, 10.4, 12.2 and the current master.

Link: https://lkml.kernel.org/r/624bda92-f307-30e9-9aaa-8cc678b2dfb2@suse.cz
Signed-off-by: Martin Liska <mliska@suse.cz>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:22 -07:00
Alexey Romanov
4249a05ff6 zsmalloc: zs_destroy_pool: add size_class NULL check
Inside the zs_destroy_pool() function, there can still be NULL size_class
pointers: if when the next size_class is allocated, inside
zs_create_pool() function, kzalloc will return NULL and handling the error
condition, zs_create_pool() will call zs_destroy_pool().

Link: https://lkml.kernel.org/r/20221013112825.61869-1-avromanov@sberdevices.ru
Fixes: f24263a5a0 ("zsmalloc: remove unnecessary size_class NULL check")
Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:21 -07:00
Liam Howlett
7329e3ebe3 mm/mempolicy: fix mbind_range() arguments to vma_merge()
Fuzzing produced an invalid argument to vma_merge() which was caught by
the newly added verification of the number of VMAs being removed on
process exit.  Analyzing the failure eventually resulted in finding an
issue with the search of a VMA that started at address 0, which caused an
underflow and thus the loss of many VMAs being tracked in the tree.  Fix
the underflow by changing the search of the maple tree to use the start
address directly.

Link: https://lkml.kernel.org/r/20221015021135.2816178-1-Liam.Howlett@oracle.com
Fixes: 66850be55e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
  Link: https://lore.kernel.org/r/202210052318.5ad10912-oliver.sang@intel.com
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:21 -07:00
Qais Yousef
cef408e70e mailmap: update email for Qais Yousef
Update my email address for old entry and add a new entry for my
contribution while working with arm to continue support that work.

Link: https://lkml.kernel.org/r/20221014141016.539625-1-qyousef@layalina.io
Signed-off-by: Qais Yousef <qyousef@layalina.io>
Acked-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Qais Yousef <qsyousef@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:21 -07:00
Dan Carpenter
5ad15f1b32 mailmap: update Dan Carpenter's email address
My time at Oracle is ending at the end of the month.  Update my email
address accordingly.

Link: https://lkml.kernel.org/r/Y0a+6+5SHMdvUnpg@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-20 21:27:21 -07:00
Andrew Morton
8048b83580 Merge branch 'master' into mm-hotfixes-stable 2022-10-16 16:09:11 -07:00
Andrew Morton
280330fac4 Merge branch 'master' into mm-hotfixes-stable 2022-10-16 16:06:53 -07:00
Linus Torvalds
9abf2313ad Linux 6.1-rc1 2022-10-16 15:36:24 -07:00
Linus Torvalds
f1947d7c8a Random number generator fixes for Linux 6.1-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
 A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
 qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
 R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
 lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
 MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
 8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
 XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
 Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
 vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
 4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
 lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
 =M+mV
 -----END PGP SIGNATURE-----

Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:
 "This time with some large scale treewide cleanups.

  The intent of this pull is to clean up the way callers fetch random
  integers. The current rules for doing this right are:

   - If you want a secure or an insecure random u64, use get_random_u64()

   - If you want a secure or an insecure random u32, use get_random_u32()

     The old function prandom_u32() has been deprecated for a while
     now and is just a wrapper around get_random_u32(). Same for
     get_random_int().

   - If you want a secure or an insecure random u16, use get_random_u16()

   - If you want a secure or an insecure random u8, use get_random_u8()

   - If you want secure or insecure random bytes, use get_random_bytes().

     The old function prandom_bytes() has been deprecated for a while
     now and has long been a wrapper around get_random_bytes()

   - If you want a non-uniform random u32, u16, or u8 bounded by a
     certain open interval maximum, use prandom_u32_max()

     I say "non-uniform", because it doesn't do any rejection sampling
     or divisions. Hence, it stays within the prandom_*() namespace, not
     the get_random_*() namespace.

     I'm currently investigating a "uniform" function for 6.2. We'll see
     what comes of that.

  By applying these rules uniformly, we get several benefits:

   - By using prandom_u32_max() with an upper-bound that the compiler
     can prove at compile-time is ≤65536 or ≤256, internally
     get_random_u16() or get_random_u8() is used, which wastes fewer
     batched random bytes, and hence has higher throughput.

   - By using prandom_u32_max() instead of %, when the upper-bound is
     not a constant, division is still avoided, because
     prandom_u32_max() uses a faster multiplication-based trick instead.

   - By using get_random_u16() or get_random_u8() in cases where the
     return value is intended to indeed be a u16 or a u8, we waste fewer
     batched random bytes, and hence have higher throughput.

  This series was originally done by hand while I was on an airplane
  without Internet. Later, Kees and I worked on retroactively figuring
  out what could be done with Coccinelle and what had to be done
  manually, and then we split things up based on that.

  So while this touches a lot of files, the actual amount of code that's
  hand fiddled is comfortably small"

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove unused functions
  treewide: use get_random_bytes() when possible
  treewide: use get_random_u32() when possible
  treewide: use get_random_{u8,u16}() when possible, part 2
  treewide: use get_random_{u8,u16}() when possible, part 1
  treewide: use prandom_u32_max() when possible, part 2
  treewide: use prandom_u32_max() when possible, part 1
2022-10-16 15:27:07 -07:00
Linus Torvalds
8636df94ec perf tools changes for v6.1: 2nd batch
- Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
   when using bperf (perf BPF based counters) with cgroups.
 
 - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
   monitors bandwidth, latency, bus utilization and buffer occupancy.
 
   Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
 
 - User space tasks can migrate between CPUs, so when tracing selected
   CPUs, system-wide sideband is still needed, fix it in the setup of
   Intel PT on hybrid systems.
 
 - Fix metricgroups title message in 'perf list', it should state that
   the metrics groups are to be used with the '-M' option, not '-e'.
 
 - Sync the msr-index.h copy with the kernel sources, adding support
   for using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as
   well as decoding it when printing the MSR tracepoint arguments.
 
 - Fix program header size and alignment when generating a JIT ELF
   in 'perf inject'.
 
 - Add multiple new Intel PT 'perf test' entries, including a jitdump one.
 
 - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
   running on PowerPC due to an invalid topology number in that arch.
 
 - Fix the 'perf test' for arm_coresight failures on the ARM Juno system.
 
 - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this option
   to the or expression expected in the intercepted perf_event_open() syscall.
 
 - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the 'perf
   annotate' asm parser.
 
 - Fix 'perf mem record -C' option processing, it was being chopped up
   when preparing the underlying 'perf record -e mem-events' and thus being
   ignored, requiring using '-- -C CPUs' as a workaround.
 
 - Improvements and tidy ups for 'perf test' shell infra.
 
 - Fix Intel PT information printing segfault in uClibc, where a NULL
   format was being passed to fprintf.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY0vyXAAKCRCyPKLppCJ+
 J3EgAQDgr9FhTCTG+u46iGqPG4lxc46ZWKB3MgZwPuX6P2jwLwD9GCwGow4qHQVP
 F/m7S/3tK/ShPfPWB2m4nVHd9xp7uwM=
 =F1IB
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

 - Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
   when using bperf (perf BPF based counters) with cgroups.

 - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
   monitors bandwidth, latency, bus utilization and buffer occupancy.

   Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.

 - User space tasks can migrate between CPUs, so when tracing selected
   CPUs, system-wide sideband is still needed, fix it in the setup of
   Intel PT on hybrid systems.

 - Fix metricgroups title message in 'perf list', it should state that
   the metrics groups are to be used with the '-M' option, not '-e'.

 - Sync the msr-index.h copy with the kernel sources, adding support for
   using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
   as decoding it when printing the MSR tracepoint arguments.

 - Fix program header size and alignment when generating a JIT ELF in
   'perf inject'.

 - Add multiple new Intel PT 'perf test' entries, including a jitdump
   one.

 - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
   running on PowerPC due to an invalid topology number in that arch.

 - Fix the 'perf test' for arm_coresight failures on the ARM Juno
   system.

 - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
   option to the or expression expected in the intercepted
   perf_event_open() syscall.

 - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
   'perf annotate' asm parser.

 - Fix 'perf mem record -C' option processing, it was being chopped up
   when preparing the underlying 'perf record -e mem-events' and thus
   being ignored, requiring using '-- -C CPUs' as a workaround.

 - Improvements and tidy ups for 'perf test' shell infra.

 - Fix Intel PT information printing segfault in uClibc, where a NULL
   format was being passed to fprintf.

* tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
  perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
  perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
  perf tests stat+json_output: Include sanity check for topology
  perf tests stat+csv_output: Include sanity check for topology
  perf intel-pt: Fix system_wide dummy event for hybrid
  perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
  perf test: Fix attr tests for PERF_FORMAT_LOST
  perf test: test_intel_pt.sh: Add 9 tests
  perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
  perf test: test_intel_pt.sh: Add jitdump test
  perf test: test_intel_pt.sh: Tidy some alignment
  perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
  perf test: test_intel_pt.sh: Tidy some perf record options
  perf test: test_intel_pt.sh: Fix return checking again
  perf: Skip and warn on unknown format 'configN' attrs
  perf list: Fix metricgroups title message
  perf mem: Fix -C option behavior for perf mem record
  perf annotate: Add missing condition flags for arm64
  ...
2022-10-16 15:14:29 -07:00
Linus Torvalds
2df76606db Kbuild fixes for v6.1
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for
    the combination of Clang >= 14 and GAS <= 2.35.
 
  - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
    the package size.
 
  - Fix modpost error under build environments using musl.
 
  - Make *.ll files keep value names for easier debugging
 
  - Fix single directory build
 
  - Prevent RISC-V from selecting the broken DWARF5 support when Clang
    and GAS are used together.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmNMSCcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGuVsP/j9FBN3x9S14gAHpu4BAFLK0s31W
 A5sGtmEb1keLqW4oY7/5bcr8KgIrY1extJBeSOJHLB1z/cfU7CHd7bl3+oadZH+z
 BNQ7F9SAHm9GuZoM58TMmC5/Eq0a45bqEP32wvoscyrFQ0ka11aQw/lOZmVTYSgO
 NrTHUSD6NmJCG8hbMiJAH8ch+fziSR0JXOomOwJDxs63aXHhavjZ3z7pgySnuPav
 PD46QtKtpjH8H+gx4nJMqDWjaukGlq7+kVIHhZh3oC5KU23UfUc3d3U+Lpati4+w
 Ggl1pmR5iMsYioQ/MaC58hb06WkamAYRfxKWXvpzEAVGIHF+xhMdGybK4FOPQkQh
 J9Rb358LD1d/QtH6C77wajaEj1FvQLaOQ8CHUDSzjgGwJuz+qrpI8kwtgRxJCXgp
 0+2YQxdfWR2kJ9W7lnyguVjM7AYebqS7bCGm2fDPU92NWftw4y2TJii1v10BCD/N
 dB3orKHPp3mosAS2SdTXgMYYMlzFMzgma0PzibWvm4DE4tHtndRMvW/8c5UyB8uk
 ganuHOUg8Vup79OiANSD6lJrzq0fZofvD3euD61mis6s39GAeHvr5rlwy0xOoN8A
 TgOBu2DQFUKrlZH2m4F+hEBzCz26HTkg8+S5DNpb7Qr2EKDlLPT3xjwhQlooipNc
 KuZNXoR6wEstepn/
 =EZAr
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
   combination of Clang >= 14 and GAS <= 2.35.

 - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
   the package size.

 - Fix modpost error under build environments using musl.

 - Make *.ll files keep value names for easier debugging

 - Fix single directory build

 - Prevent RISC-V from selecting the broken DWARF5 support when Clang
   and GAS are used together.

* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
  kbuild: fix single directory build
  kbuild: add -fno-discard-value-names to cmd_cc_ll_c
  scripts/clang-tools: Convert clang-tidy args to list
  modpost: put modpost options before argument
  kbuild: Stop including vmlinux.bz2 in the rpm's
  Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
  Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
2022-10-16 11:12:22 -07:00
Linus Torvalds
2fcd8f108f This is the final part of the clk patches for this merge window.
The clk rate range series needed another week to fully bake. Maxime
 fixed the bug that broke clk notifiers and prevented this from being
 included in the first pull request. He also added a unit test on top to
 make sure it doesn't break so easily again. The majority of the series
 fixes up how the clk_set_rate_*() APIs work, particularly around when
 the rate constraints are dropped and how they move around when
 reparenting clks. Overall it's a much needed improvement to the clk rate
 range APIs that used to be pretty broken if you looked sideways.
 
 Beyond the core changes there are a few driver fixes for a compilation
 issue or improper data causing clks to fail to register or have the
 wrong parents. These are good to get in before the first -rc so that the
 system actually boots on the affected devices.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmNLfwARHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSW1uhAA0QMgI8Ywv18PfZi2vWGpAO9sB62DfmdU
 sbmsXrfEHdJmXmqT66Ydr8area6loeCagK4Vm/dEcgsf2DwI/xCdDra/ARZjLU49
 9VjgC332YLZzk6bHXsY2eqCA2TS6nV4ZoVsonkQv2vezYNLNk7FTgByzIcWpYiY8
 RuKEVHnp2yWwk5HrX+pELR0dMDCLTB3p+WVHmnQpyYVK+rcfaSNuDxLTSNXb3yEl
 tGTUu4eL09FMwyRZ4iGmRXpvzIacbthYmnGmEtnOYDeFV3k3wBwHPbEizstvS0MI
 vv89aHdsOnGgTdzPUZtA6UppByajyoDKbYzb3hw8pUPNNykbq6XmaeBTV7U2O9De
 ihfeHVlDjN2HCv1HXfDsCaqlD6WM25T+pmKPT45Zj9b+rKkxloVxsOBuLmMzDgNd
 fa1X3qfGfzm5jpZ1SVSTLdRSqOT5Q00nzAgQailUiDoumgdTSN5ZDNPHcIv/Crvn
 me+pabVldp0tgYvuNWYr46u7ugwnzUMBVJfnQ+xZTl8xqLQ/yRmkkhB/rsS5RDY1
 z6NZ66JyHpSwnaev4Ozjyj8GlRdgDUVHGa/4Vm1jJSbUb7THZGSzjCklZXPxkn8I
 VqHNJN+luzaf6bKe85yrgUJKOi8NMZZS//MKWDnOdhDqgqtI0pM4hJX+Tvb2Bc4B
 2SA7XzHesKg=
 =ZG8Y
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull more clk updates from Stephen Boyd:
 "This is the final part of the clk patches for this merge window.

  The clk rate range series needed another week to fully bake. Maxime
  fixed the bug that broke clk notifiers and prevented this from being
  included in the first pull request. He also added a unit test on top
  to make sure it doesn't break so easily again. The majority of the
  series fixes up how the clk_set_rate_*() APIs work, particularly
  around when the rate constraints are dropped and how they move around
  when reparenting clks. Overall it's a much needed improvement to the
  clk rate range APIs that used to be pretty broken if you looked
  sideways.

  Beyond the core changes there are a few driver fixes for a compilation
  issue or improper data causing clks to fail to register or have the
  wrong parents. These are good to get in before the first -rc so that
  the system actually boots on the affected devices"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
  clk: tegra: Fix Tegra PWM parent clock
  clk: at91: fix the build with binutils 2.27
  clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
  clk: mediatek: clk-mux: Add .determine_rate() callback
  clk: tests: Add tests for notifiers
  clk: Update req_rate on __clk_recalc_rates()
  clk: tests: Add missing test case for ranges
  clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
  clk: Introduce the clk_hw_get_rate_range function
  clk: Zero the clk_rate_request structure
  clk: Stop forwarding clk_rate_requests to the parent
  clk: Constify clk_has_parent()
  clk: Introduce clk_core_has_parent()
  clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
  clk: Add our request boundaries in clk_core_init_rate_req
  clk: Introduce clk_hw_init_rate_request()
  clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
  clk: Change clk_core_init_rate_req prototype
  clk: Set req_rate on reparenting
  clk: Take into account uncached clocks in clk_set_rate_range()
  ...
2022-10-16 11:08:19 -07:00
Linus Torvalds
b08cd74448 15 cifs/smb3 fixes including 2 for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmNLLdMACgkQiiy9cAdy
 T1EWGwv/SN2nGgQ5QBxcqHmKyYyP/ndo3iig6L55VdBrWjTIH6vMAJfDAZwq5Veb
 YNkN6EQhT4XQ8RQ3nzJdSuwSLZu3CF5JMdChyoM3qgLSrT9CU3n2E5n0S/OK7TNI
 QvX2RMZrOiwl2sApJXk5xQgmocGwNOMdsI9IwxZ8zDATpLUicwYmcNajDTmPHjX7
 W8IVh4hHTBmVpnN5WkKLninLCz23eqcEMQb+nxtD3sMhOAUGn6gkKo+0L1EOYwgS
 jFfZxGbLoIBi1IhMcoWEcODnDtpwS9vtGDy2ZPVkefp5LqjjEEqR7LQLAvSt0dYK
 MpXHAvTUO8T7EEDx2djUNVM0JfXYBa0w3iOqSkOS2EBZF/Q8/ABWccfNpCTnGoAZ
 G3VJvDP/K5iPiRwy/0KK6CeCvCiXgBslcvRVdIvSCsehuvEyYTXRrjRLHElZBVZz
 Odwn8C9UGy5w0DTQ6QvwYtYWpoi2UFm8GuoswOfljYwnrul+7P0R14xxARJE8TOh
 fBXL0FAu
 =8BkB
 -----END PGP SIGNATURE-----

Merge tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

 - fix a regression in guest mounts to old servers

 - improvements to directory leasing (caching directory entries safely
   beyond the root directory)

 - symlink improvement (reducing roundtrips needed to process symlinks)

 - an lseek fix (to problem where some dir entries could be skipped)

 - improved ioctl for returning more detailed information on directory
   change notifications

 - clarify multichannel interface query warning

 - cleanup fix (for better aligning buffers using ALIGN and round_up)

 - a compounding fix

 - fix some uninitialized variable bugs found by Coverity and the kernel
   test robot

* tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: improve SMB3 change notification support
  cifs: lease key is uninitialized in two additional functions when smb1
  cifs: lease key is uninitialized in smb1 paths
  smb3: must initialize two ACL struct fields to zero
  cifs: fix double-fault crash during ntlmssp
  cifs: fix static checker warning
  cifs: use ALIGN() and round_up() macros
  cifs: find and use the dentry for cached non-root directories also
  cifs: enable caching of directories for which a lease is held
  cifs: prevent copying past input buffer boundaries
  cifs: fix uninitialised var in smb2_compound_op()
  cifs: improve symlink handling for smb2+
  smb3: clarify multichannel warning
  cifs: fix regression in very old smb1 mounts
  cifs: fix skipping to incorrect offset in emit_cached_dirents
2022-10-16 11:01:40 -07:00
Tetsuo Handa
80493877d7 Revert "cpumask: fix checking valid cpu range".
This reverts commit 78e5a33994 ("cpumask: fix checking valid cpu range").

syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
cpu_max_bits_warn() [1], for commit 78e5a33994 ("cpumask: fix checking
valid cpu range") is broken.  Obviously that patch hits WARN_ON_ONCE()
when e.g.  reading /proc/cpuinfo because passing "cpu + 1" instead of
"cpu" will trivially hit cpu == nr_cpumask_bits condition.

Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
this problem was not fixed immediately.  As a result, that patch was
sent to linux.git before the patch author recognizes this problem, and
syzbot started failing to test changes in linux.git since 2022/10/10
[3].

Andrew Jones proposed a fix for x86 and riscv architectures [4].  But
[2] and [5] indicate that affected locations are not limited to arch
code.  More delay before we find and fix affected locations, less tested
kernel (and more difficult to bisect and fix) before release.

We should have inspected and fixed basically all cpumask users before
applying that patch.  We should not crash kernels in order to ask
existing cpumask users to update their code, even if limited to
CONFIG_DEBUG_PER_CPU_MAPS=y case.

Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4]
Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-16 10:45:17 -07:00
Nathan Chancellor
0a6de78cff lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:

  /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported

Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:

  .Ldebug_loc0:
          .byte   4                               # DW_LLE_offset_pair
          .uleb128 .Lfunc_begin0-.Lfunc_begin0    #   starting offset
          .uleb128 .Ltmp1-.Lfunc_begin0           #   ending offset
          .byte   1                               # Loc expr size
          .byte   90                              # DW_OP_reg10
          .byte   0                               # DW_LLE_end_of_list

  .Ldebug_ranges0:
          .byte   4                               # DW_RLE_offset_pair
          .uleb128 .Ltmp6-.Lfunc_begin0           #   starting offset
          .uleb128 .Ltmp27-.Lfunc_begin0          #   ending offset
          .byte   4                               # DW_RLE_offset_pair
          .uleb128 .Ltmp28-.Lfunc_begin0          #   starting offset
          .uleb128 .Ltmp30-.Lfunc_begin0          #   ending offset
          .byte   0                               # DW_RLE_end_of_list

There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.

To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-17 02:06:47 +09:00
Masahiro Yamada
3753af778d kbuild: fix single directory build
Commit f110e5a250 ("kbuild: refactor single builds of *.ko") was wrong.

KBUILD_MODULES _is_ needed for single builds.

Otherwise, "make foo/bar/baz/" does not build module objects at all.

Fixes: f110e5a250 ("kbuild: refactor single builds of *.ko")
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: David Sterba <dsterba@suse.com>
2022-10-17 02:03:52 +09:00
Linus Torvalds
1501278bb7 slab hotfix for 6.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmNLD/EACgkQ4CHKc/GJ
 qRCnhQf+Oj0qB9bdy+MmgirN0/VHFmTQbNSYUd/gzGmfcAHpxIE9KG0V9+y9I2wG
 Nh6WgUKwX1IEKQ37X+VT/XsIe9VcALcn5LjxD/J4cL71CREa/0HGQbBavt9GuDsC
 zkUwxYx6iAtGfK/PK9jE2eHIzxzfZ6kEkFsMaS+jP/8iLnE9trAhQ1o6vG15EFPA
 MHjJ3+y7AsUE7SYHKL+8WLA+QR443SlHN0u327KkA2kKpjsj+hqQdiPfHqOArBbo
 vw2DI14tcELGtruo5zHMVT9TcXWV7hcJ6yTTnaKxI+WCbgsEpPQKevTmc7q9P0H4
 hLgQEElRuzBrXUCIBPVboNuTgGNjLQ==
 =cVwd
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab hotfix from Vlastimil Babka:
 "A single fix for the common-kmalloc series, for warnings on mips and
  sparc64 reported by Guenter Roeck"

* tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
2022-10-15 17:05:07 -07:00
Linus Torvalds
36d8a3edf8 OpenRISC 6.1 Updates
I have relocated to London so not much work from me while I get settled.
 
 Still, OpenRISC picked up two patches in this window:
  - Fix for kernel page table walking from Jann Horn
  - MAINTAINER entry cleanup from Palmer Dabbelt
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmNK478ACgkQw7McLV5m
 J+QdSw/+NFTmfpo40EGa3IdrK9ZQCgQoPN7rGKEr+QpRcVZuER+oDax6+hbobmBp
 E+0Trdx8px/YJLWVC9imGxqMS8ev1twLy3VrCi+5p7B6mYUkMcDkf5Et1uI5WLky
 3BBnw92Ycmc+bR2ROESRp4mvMHDdVhdIt60sHSFjcGTH2FnB6IiwUFX/GrcxYy9U
 kkLrQ7AW36SzVIZ/0CNYZwxvlHv5n9aFiii0SKQriCuWDT2EvUw3f4kaOEzTg7pA
 UKUU3CU8n4Rg9LeHiVb0KqLlGS+/ZntO9Ad2jgYvF4X42ijLnTgjSUOjmeC1D5R8
 N0h1Y9WByL513MymfnptVrZSIhVOFhtsPgP2ys4CXMfHX+0iRKNOt3ZpSHFXpjDe
 SlVA2k+BBt8SPwAIMoQBClaGKdgT0vjTE/l389BujhIDgdG18HEIhB77SLnyXvNq
 VuXfeRlQuvEkSGH0sMdFr9rBM08Q3fS5S+i+2YJucChLndnZr+iE2KhYiG/tzia3
 u64LmYDrPFNTIDjg8huLnHmUxlVcPK9vQSn3VRI1pKn47v4zD3rHa4mFryR6rhNw
 xQDFOmAeeOmBiHoatwObGHaItEChLWvBW63JdUh873N2lg5UwcqqKp8e10KT35kG
 CZA3g4OSTXM+PDWOJi1tDec08smIFqUv/U53tckcGV4b0myFNk0=
 =WKxB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
 "I have relocated to London so not much work from me while I get
  settled.

  Still, OpenRISC picked up two patches in this window:

   - Fix for kernel page table walking from Jann Horn

   - MAINTAINER entry cleanup from Palmer Dabbelt"

* tag 'for-linus' of https://github.com/openrisc/linux:
  MAINTAINERS: git://github -> https://github.com for openrisc
  openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
2022-10-15 16:47:33 -07:00
Linus Torvalds
41410965c3 pci-v6.1-fixes-1
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmNKzlUUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxBbA//QYuw8iZHzshJntCyWlTRg742x2jI
 2NrLojVG/ccSlqnl/KRultaPFdSlZb221UASt7CAEIGoP6+/SNepazEEJ4QJhMX6
 inY/hYrVd2OHhCnO+OoNd7DKfbeJRkFVMHOGJdpbQ7jMwa99YSaN6xEMfALcZIrg
 8CqYKDpdJW4avYyKlfXayY42d8gljWJwfLuK1xEnnIrdvXQEB3vj3yKxziadhBtA
 EJbLWORVF+yQwq63NCPkLvCe0ZuV09a4q4IhlzWo+30FLMOsk2z4HBi1B2fwlawV
 tnYCt0i6FNC6e3DJyWX/hkAiIbJUadnych+LcGq+/FY30tCBtNkNaQN3XsrvdvI4
 5piacj1av21R7JHSyk64M4jdNlE7MYonQNsc8vp4yDtlAjQYkaj7BJdsIq3MCTei
 ce6mxguVb2hrMr4fZXc8ka7XZkofJNv/XhfuQAorzKfEqA8cI4enICsPdoTFyZDV
 pYYuHIJXY4rTm0wUD+TlDg/Cn6gyLu3WYVyEn6q/E53Twj1rr5ske7igDMDVYYly
 qMg8CTPHYzHJ0wbzkf0CfND7SVAAE1BnIzr4Drfbrm+fi11HVhd0u/0XiWf2NthW
 /ZFQUDBh0SwSHgUnO+ONqcR3arhIT4PEsEJadD4kuW4nIyPS7t1Nfu0V3XUZDa14
 U6UeL6URnXBDV7U=
 =gs3v
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fix from Bjorn Helgaas:
 "Revert the attempt to distribute spare resources to unconfigured
  hotplug bridges at boot time.

  This fixed some dock hot-add scenarios, but Jonathan Cameron reported
  that it broke a topology with a multi-function device where one
  function was a Switch Upstream Port and the other was an Endpoint"

* tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  Revert "PCI: Distribute available resources for root buses, too"
2022-10-15 16:36:38 -07:00
Hyeonggon Yoo
e36ce448a0 mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
After commit d6a71648db ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.

SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.

Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.

Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: d6a71648db ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-15 21:42:05 +02:00
Palmer Dabbelt
34a0bac084 MAINTAINERS: git://github -> https://github.com for openrisc
Github deprecated the git:// links about a year ago, so let's move to
the https:// URLs instead.

Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
2022-10-15 17:26:51 +01:00
Steve French
e3e9463414 smb3: improve SMB3 change notification support
Change notification is a commonly supported feature by most servers,
but the current ioctl to request notification when a directory is
changed does not return the information about what changed
(even though it is returned by the server in the SMB3 change
notify response), it simply returns when there is a change.

This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify
information structure which includes the name of the file(s) that
changed and why. See MS-SMB2 2.2.35 for details on the individual
filter flags and the file_notify_information structure returned.

To use this simply pass in the following (with enough space
to fit at least one file_notify_information structure)

struct __attribute__((__packed__)) smb3_notify {
       uint32_t completion_filter;
       bool     watch_tree;
       uint32_t data_len;
       uint8_t  data[];
} __packed;

using CIFS_IOC_NOTIFY_INFO 0xc009cf0b
 or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info)

The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).

Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15 10:05:53 -05:00
Steve French
2bff065933 cifs: lease key is uninitialized in two additional functions when smb1
cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized
in smb1 mounts.  It is cleaner to set lease key to zero in these
places where leases are not supported (smb1 can not return lease keys
so the field was uninitialized).

Addresses-Coverity: 1514207 ("Uninitialized scalar variable")
Addresses-Coverity: 1514331 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15 10:05:53 -05:00
Steve French
625b60d4f9 cifs: lease key is uninitialized in smb1 paths
It is cleaner to set lease key to zero in the places where leases are not
supported (smb1 can not return lease keys so the field was uninitialized).

Addresses-Coverity: 1513994 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15 10:05:53 -05:00
Steve French
f09bd695af smb3: must initialize two ACL struct fields to zero
Coverity spotted that we were not initalizing Stbz1 and Stbz2 to
zero in create_sd_buf.

Addresses-Coverity: 1513848 ("Uninitialized scalar variable")
Cc: <stable@vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15 10:05:53 -05:00
Paulo Alcantara
b854b4ee66 cifs: fix double-fault crash during ntlmssp
The crash occurred because we were calling memzero_explicit() on an
already freed sess_data::iov[1] (ntlmsspblob) in sess_free_buffer().

Fix this by not calling memzero_explicit() on sess_data::iov[1] as
it's already by handled by callers.

Fixes: a4e430c8c8 ("cifs: replace kfree() with kfree_sensitive() for sensitive data")
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15 10:04:38 -05:00
Arnaldo Carvalho de Melo
a3a365655a tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:

  b8d1d16360 ("x86/apic: Don't disable x2APIC if locked")
  ca5b7c0d96 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")

Addressing these tools/perf build warnings:

    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2022-10-14 18:06:34.294561729 -0300
  +++ after	2022-10-14 18:06:41.285744044 -0300
  @@ -264,6 +264,7 @@
   	[0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
   	[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
   	[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
  +	[0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
   	[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
   	[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
   	[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
  $

Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  Using CPUID AuthenticAMD-25-21-0
  0x6a0
  0x6a8
  New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  0x6a0
  0x6a8
  New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
     0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule ([kernel.kallsyms])
                                       futex_wait_queue_me ([kernel.kallsyms])
                                       futex_wait ([kernel.kallsyms])
                                       do_futex ([kernel.kallsyms])
                                       __x64_sys_futex ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
     0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule_idle ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
                                       cpu_startup_entry ([kernel.kallsyms])
                                       secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00
Qi Liu
5e91e57e68 perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
Add support for using 'perf report --dump-raw-trace' to parse PTT packet.

Example usage:

Output will contain raw PTT data and its textual representation, such
as (8DW format):

0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0
ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
.
. ... HISI PTT data: size 4194304 bytes
.  00000000: 00 00 00 00                                 Prefix
.  00000004: 08 20 00 60                                 Header DW0
.  00000008: ff 02 00 01                                 Header DW1
.  0000000c: 20 08 00 00                                 Header DW2
.  00000010: 10 e7 44 ab                                 Header DW3
.  00000014: 2a a8 1e 01                                 Time
.  00000020: 00 00 00 00                                 Prefix
.  00000024: 01 00 00 60                                 Header DW0
.  00000028: 0f 1e 00 01                                 Header DW1
.  0000002c: 04 00 00 00                                 Header DW2
.  00000030: 40 00 81 02                                 Header DW3
.  00000034: ee 02 00 00                                 Time
....

This patch only add basic parsing support according to the definition of
the PTT packet described in Documentation/trace/hisi-ptt.rst. And the
fields of each packet can be further decoded following the PCIe Spec's
definition of TLP packet.

Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00
Qi Liu
057381a7ec perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
PCIe link's events, and trace the TLP headers).

This patch add support for PTT device in perf tool, so users could use
'perf record' to get TLP headers trace data.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00
Qi Liu
45a3975f8e perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
Add find_pmu_for_event() and use to simplify logic in
auxtrace_record_init(). find_pmu_for_event() will be reused in
subsequent patches.

Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi6124@gmail.com>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zeng Prime <prime.zeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00
Athira Rajeev
58d4802a5e perf tests stat+json_output: Include sanity check for topology
Testcase stat+json_output.sh fails in powerpc:

	86: perf stat JSON output linter : FAILED!

The testcase "stat+json_output.sh" verifies perf stat JSON output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.

The values for socket, die, core and cpu are fetched from topology
directory:

  /sys/devices/system/cpu/cpu*/topology.

For example, socket value is fetched from "physical_package_id" file of
topology directory.  (cpu__get_topology_int() in util/cpumap.c)

If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.

Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts  number of fields in the
output.

Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.

Patch here adds a sanity check in the testcase for topology.  Check will
help to skip the test if -1 value found.

Fixes: 0c343af2a2 ("perf test: JSON format checking")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00
Athira Rajeev
cd400f6f18 perf tests stat+csv_output: Include sanity check for topology
Testcase stat+csv_output.sh fails in powerpc:

	84: perf stat CSV output linter: FAILED!

The testcase "stat+csv_output.sh" verifies perf stat CSV output. The
test covers aggregation modes like per-socket, per-core, per-die, -A
(no_aggr mode) along with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
fields in the output having "CPU" as first field. Same way, for
per-socket, it expects the first field in result to point to socket id.
The testcases compares the result with expected count.

The values for socket, die, core and cpu are fetched from topology
directory:

  /sys/devices/system/cpu/cpu*/topology.

For example, socket value is fetched from "physical_package_id" file of
topology directory.  (cpu__get_topology_int() in util/cpumap.c)

If a platform fails to fetch the topology information, values will be
set to -1. For example, incase of pSeries platform of powerpc, value for
"physical_package_id" is restricted and not exposed. So, -1 will be
assigned.

Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases where topology
values not exposed, first field of the output displaying will be empty.
This cause the testcase to fail, as it counts  number of fields in the
output.

Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
becos of -1 value obtained from topology files for some, only 6 fields
are printed. Hence a testcase failure reported due to mismatch in number
of fields in the output.

Patch here adds a sanity check in the testcase for topology.  Check will
help to skip the test if -1 value found.

Fixes: 7473ee56db ("perf test: Add checking for perf stat CSV output.")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00
Adrian Hunter
6cef7dab3e perf intel-pt: Fix system_wide dummy event for hybrid
User space tasks can migrate between CPUs, so when tracing selected CPUs,
system-wide sideband is still needed, however evlist->core.has_user_cpus
is not set in the hybrid case, so check the target cpu_list instead.

Fixes: 7d189cadbe ("perf intel-pt: Track sideband system-wide when needed")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15 10:13:16 -03:00