Commit graph

997577 commits

Author SHA1 Message Date
Rong Chen
19ec368cbc selftests/vm: fix out-of-tree build
When building out-of-tree, attempting to make target from $(OUTPUT) directory:

  make[1]: *** No rule to make target '$(OUTPUT)/protection_keys.c', needed by '$(OUTPUT)/protection_keys_32'.

Link: https://lkml.kernel.org/r/20210315094700.522753-1-rong.a.chen@intel.com
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Sean Christopherson
c2655835fd mm/mmu_notifiers: ensure range_end() is paired with range_start()
If one or more notifiers fails .invalidate_range_start(), invoke
.invalidate_range_end() for "all" notifiers.  If there are multiple
notifiers, those that did not fail are expecting _start() and _end() to
be paired, e.g.  KVM's mmu_notifier_count would become imbalanced.
Disallow notifiers that can fail _start() from implementing _end() so
that it's unnecessary to either track which notifiers rejected _start(),
or had already succeeded prior to a failed _start().

Note, the existing behavior of calling _start() on all notifiers even
after a previous notifier failed _start() was an unintented "feature".
Make it canon now that the behavior is depended on for correctness.

As of today, the bug is likely benign:

  1. The only caller of the non-blocking notifier is OOM kill.
  2. The only notifiers that can fail _start() are the i915 and Nouveau
     drivers.
  3. The only notifiers that utilize _end() are the SGI UV GRU driver
     and KVM.
  4. The GRU driver will never coincide with the i195/Nouveau drivers.
  5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
     _guest_, and the guest is already doomed due to being an OOM victim.

Fix the bug now to play nice with future usage, e.g.  KVM has a
potential use case for blocking memslot updates in KVM while an
invalidation is in-progress, and failure to unblock would result in said
updates being blocked indefinitely and hanging.

Found by inspection.  Verified by adding a second notifier in KVM that
periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
and observing that KVM exits with an elevated notifier count.

Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com
Fixes: 93065ac753 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Andrey Konovalov
cf10bd4c4a kasan: fix per-page tags for non-page_alloc pages
To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.

Currently, the default tag value stored in page->flags is 0x00.
Therefore, page_address() returns a 0x00ffff...  address for pages that
were not allocated via page_alloc.

This might cause problems.  A particular case we encountered is a
conflict with KFENCE.  If a KFENCE-allocated slab object is being freed
via kfree(page_address(page) + offset), the address passed to kfree()
will get tagged with 0x00 (as slab pages keep the default per-page
tags).  This leads to is_kfence_address() check failing, and a KFENCE
object ending up in normal slab freelist, which causes memory
corruptions.

This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.

With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.

This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.

Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes: 2813b9c029 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Miaohe Lin
d85aecf284 hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
The current implementation of hugetlb_cgroup for shared mappings could
have different behavior.  Consider the following two scenarios:

 1.Assume initial css reference count of hugetlb_cgroup is 1:
  1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
      count is 2 associated with 1 file_region.
  1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
      count is 3 associated with 2 file_region.
  1.3 coalesce_file_region will coalesce these two file_regions into
      one. So css reference count is 3 associated with 1 file_region
      now.

 2.Assume initial css reference count of hugetlb_cgroup is 1 again:
  2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
      count is 2 associated with 1 file_region.

Therefore, we might have one file_region while holding one or more css
reference counts. This inconsistency could lead to imbalanced css_get()
and css_put() pair. If we do css_put one by one (i.g. hole punch case),
scenario 2 would put one more css reference. If we do css_put all
together (i.g. truncate case), scenario 1 will leak one css reference.

The imbalanced css_get() and css_put() pair would result in a non-zero
reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup
directory is removed __but__ associated resource is not freed. This
might result in OOM or can not create a new hugetlb cgroup in a busy
workload ultimately.

In order to fix this, we have to make sure that one file_region must
hold exactly one css reference. So in coalesce_file_region case, we
should release one css reference before coalescence. Also only put css
reference when the entire file_region is removed.

The last thing to note is that the caller of region_add() will only hold
one reference to h_cg->css for the whole contiguous reservation region.
But this area might be scattered when there are already some
file_regions reside in it. As a result, many file_regions may share only
one h_cg->css reference. In order to ensure that one file_region must
hold exactly one css reference, we should do css_get() for each
file_region and release the reference held by caller when they are done.

[linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings]
  Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com

Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com
Fixes: 075a61d07a ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR)
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Jens Axboe
f5d2d23bf0 io-wq: fix race around pending work on teardown
syzbot reports that it's triggering the warning condition on having
pending work on shutdown:

WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
Modules linked in:
CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
 __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
 io_uring_files_cancel include/linux/io_uring.h:47 [inline]
 do_exit+0x258/0x2340 kernel/exit.c:780
 do_group_exit+0x168/0x2d0 kernel/exit.c:922
 get_signal+0x1734/0x1ef0 kernel/signal.c:2773
 arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x465f69

which shouldn't happen, but seems to be possible due to a race on whether
or not the io-wq manager sees a fatal signal first, or whether the io-wq
workers do. If we race with queueing work and then send a fatal signal to
the owning task, and the io-wq worker sees that before the manager sets
IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
behind.

Just turn the WARN_ON_ONCE() into a cancelation condition instead.

Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-25 10:16:12 -06:00
Potnuri Bharat Teja
3408be145a RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server
Not setting the ipv6 bit while destroying ipv6 listening servers may
result in potential fatal adapter errors due to lookup engine memory hash
errors. Therefore always set ipv6 field while destroying ipv6 listening
servers.

Fixes: 830662f6f0 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
Link: https://lore.kernel.org/r/20210324190453.8171-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-25 10:25:58 -03:00
Rich Wiley
20109a859a arm64: kernel: disable CNP on Carmel
On NVIDIA Carmel cores, CNP behaves differently than it does on standard
ARM cores. On Carmel, if two cores have CNP enabled and share an L2 TLB
entry created by core0 for a specific ASID, a non-shareable TLBI from
core1 may still see the shared entry. On standard ARM cores, that TLBI
will invalidate the shared entry as well.

This causes issues with patchsets that attempt to do local TLBIs based
on cpumasks instead of broadcast TLBIs. Avoid these issues by disabling
CNP support for NVIDIA Carmel cores.

Signed-off-by: Rich Wiley <rwiley@nvidia.com>
Link: https://lore.kernel.org/r/20210324002809.30271-1-rwiley@nvidia.com
[will: Fix pre-existing whitespace issue]
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-25 10:00:23 +00:00
Maninder Singh
baa96377bc arm64/process.c: fix Wmissing-prototypes build warnings
Fix GCC warnings reported when building with "-Wmissing-prototypes":

  arch/arm64/kernel/process.c:261:6: warning: no previous prototype for '__show_regs' [-Wmissing-prototypes]
      261 | void __show_regs(struct pt_regs *regs)
          |      ^~~~~~~~~~~
  arch/arm64/kernel/process.c:307:6: warning: no previous prototype for '__show_regs_alloc_free' [-Wmissing-prototypes]
      307 | void __show_regs_alloc_free(struct pt_regs *regs)
          |      ^~~~~~~~~~~~~~~~~~~~~~
  arch/arm64/kernel/process.c:365:5: warning: no previous prototype for 'arch_dup_task_struct' [-Wmissing-prototypes]
      365 | int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
          |     ^~~~~~~~~~~~~~~~~~~~
  arch/arm64/kernel/process.c:546:41: warning: no previous prototype for '__switch_to' [-Wmissing-prototypes]
      546 | __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
          |                                         ^~~~~~~~~~~
  arch/arm64/kernel/process.c:710:25: warning: no previous prototype for 'arm64_preempt_schedule_irq' [-Wmissing-prototypes]
      710 | asmlinkage void __sched arm64_preempt_schedule_irq(void)
          |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~

Link: https://lore.kernel.org/lkml/202103192250.AennsfXM-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Link: https://lore.kernel.org/r/1616568899-986-1-git-send-email-maninder1.s@samsung.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-25 09:50:16 +00:00
Martin Wilck
36fa766faa scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
If pscsi_map_sg() fails, make sure to drop references to already allocated
bios.

Link: https://lore.kernel.org/r/20210323212431.15306-2-mwilck@suse.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 23:19:23 -04:00
Martin Wilck
077ce028b8 scsi: target: pscsi: Avoid OOM in pscsi_map_sg()
pscsi_map_sg() uses the variable nr_pages as a hint for bio_kmalloc() how
many vector elements to allocate. If nr_pages is < BIO_MAX_PAGES, it will
be reset to 0 after successful allocation of the bio.

If bio_add_pc_page() fails later for whatever reason, pscsi_map_sg() tries
to allocate another bio, passing nr_vecs = 0. This causes bio_add_pc_page()
to fail immediately in the next call. pci_map_sg() continues to allocate
zero-length bios until memory is exhausted and the kernel crashes with
OOM. This can be easily observed by exporting a SATA DVD drive via pscsi.
The target crashes as soon as the client tries to access the DVD LUN. In
the case I analyzed, bio_add_pc_page() would fail because the DVD device's
max_sectors_kb (128) was exceeded.

Avoid this by simply not resetting nr_pages to 0 after allocating the
bio. This way, the client receives an I/O error when it tries to send
requests exceeding the devices max_sectors_kb, and eventually gets it
right. The client must still limit max_sectors_kb e.g. by an udev rule if
(like in my case) the driver doesn't report valid block limits, otherwise
it encounters I/O errors.

Link: https://lore.kernel.org/r/20210323212431.15306-1-mwilck@suse.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 23:19:23 -04:00
Jia-Ju Bai
3401ecf7fc scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
When kzalloc() returns NULL, no error return code of mpt3sas_base_attach()
is assigned. To fix this bug, r is assigned with -ENOMEM in this case.

Link: https://lore.kernel.org/r/20210308035241.3288-1-baijiaju1990@gmail.com
Fixes: c696f7b83e ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 22:07:45 -04:00
Jia-Ju Bai
f69953837c scsi: qedi: Fix error return code of qedi_alloc_global_queues()
When kzalloc() returns NULL to qedi->global_queues[i], no error return code
of qedi_alloc_global_queues() is assigned.  To fix this bug, status is
assigned with -ENOMEM in this case.

Link: https://lore.kernel.org/r/20210308033024.27147-1-baijiaju1990@gmail.com
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 22:04:21 -04:00
Bart Van Assche
39c0c8553b scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
Calling vha->hw->tgt.tgt_ops->free_cmd() from qlt_xmit_response() is wrong
since the command for which a response is sent must remain valid until the
SCSI target core calls .release_cmd(). It has been observed that the
following scenario triggers a kernel crash:

 - qlt_xmit_response() calls qlt_check_reserve_free_req()

 - qlt_check_reserve_free_req() returns -EAGAIN

 - qlt_xmit_response() calls vha->hw->tgt.tgt_ops->free_cmd(cmd)

 - transport_handle_queue_full() tries to retransmit the response

Fix this crash by reverting the patch that introduced it.

Link: https://lore.kernel.org/r/20210320232359.941-2-bvanassche@acm.org
Fixes: 0dcec41acb ("scsi: qla2xxx: Make sure that aborted commands are freed")
Cc: Quinn Tran <qutran@marvell.com>
Cc: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 21:45:06 -04:00
Tyrel Datwyler
62fc266148 scsi: ibmvfc: Make ibmvfc_wait_for_ops() MQ aware
During MQ enablement of the ibmvfc driver ibmvfc_wait_for_ops() was
missed. This function is responsible for waiting on commands to complete
that match a certain criteria such as LUN or cancel key. The implementation
as is only scans the CRQ for events ignoring any sub-queues and as a result
will exit successfully without doing anything when operating in MQ
channelized mode.

Check the MQ and channel use flags to determine which queues are
applicable, and scan each queue accordingly. Note in MQ mode SCSI commands
are only issued down sub-queues and the CRQ is only used for driver
specific management commands. As such the CRQ events are ignored when
operating in MQ mode with channels.

Link: https://lore.kernel.org/r/20210319205029.312969-3-tyreld@linux.ibm.com
Fixes: 9000cb998b ("scsi: ibmvfc: Enable MQ and set reasonable defaults")
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 21:42:30 -04:00
Tyrel Datwyler
8b1c9b2025 scsi: ibmvfc: Fix potential race in ibmvfc_wait_for_ops()
For various EH activities the ibmvfc driver uses ibmvfc_wait_for_ops() to
wait for the completion of commands that match a given criteria be it
cancel key, or specific LUN. With recent changes commands are completed
outside the lock in bulk by removing them from the sent list and adding
them to a private completion list. This introduces a potential race in
ibmvfc_wait_for_ops() since the criteria for a command to be outstanding is
no longer simply being on the sent list, but instead not being on the free
list.

Avoid this race by scanning the entire command event pool and checking that
any matching command that ibmvfc needs to wait on is not already on the
free list.

Link: https://lore.kernel.org/r/20210319205029.312969-2-tyreld@linux.ibm.com
Fixes: 1f4a4a1950 ("scsi: ibmvfc: Complete commands outside the host/queue lock")
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-24 21:42:30 -04:00
Linus Torvalds
e138138003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Various fixes, all over:

   1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.

   2) Always store the rx queue mapping in veth, from Maciej
      Fijalkowski.

   3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.

   4) Fix memory leak in octeontx2-af from Colin Ian King.

   5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
      Yonghong Song.

   6) Fix tx ptp stats in mlx5, from Aya Levin.

   7) Check correct ip version in tun decap, fropm Roi Dayan.

   8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.

   9) Work item memork leak in mlx5, from Shay Drory.

  10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.

  11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.

  12) Fix data race in pxa168_eth, from Pavel Andrianov.

  13) Range validate stab in red_check_params(), from Eric Dumazet.

  14) Inherit vlan filtering setting properly in b53 driver, from
      Florian Fainelli.

  15) Fix rtnl locking in igc driver, from Sasha Neftin.

  16) Pause handling fixes in igc driver, from Muhammad Husaini
      Zulkifli.

  17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.

  18) Use after free in qlcnic, from Lv Yunlong.

  19) fix crash in fritzpci mISDN, from Tong Zhang.

  20) Premature rx buffer reuse in igb, from Li RongQing.

  21) Missing termination of ip[a driver message handler arrays, from
      Alex Elder.

  22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
      driver, from Xie He.

  23) Use after free in c_can_pci_remove(), from Tong Zhang.

  24) Uninitialized variable use in nl80211, from Jarod Wilson.

  25) Off by one size calc in bpf verifier, from Piotr Krysiuk.

  26) Use delayed work instead of deferrable for flowtable GC, from
      Yinjun Zhang.

  27) Fix infinite loop in NPC unmap of octeontx2 driver, from
      Hariprasad Kelam.

  28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
      of fifo sizes, from Corentin Labbe.

  29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.

  30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.

  31) Fix psample UAPI breakage, from Ido Schimmel"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
  psample: Fix user API breakage
  math: Export mul_u64_u64_div_u64
  ch_ktls: fix enum-conversion warning
  octeontx2-af: Fix memory leak of object buf
  ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
  net: bridge: don't notify switchdev for local FDB addresses
  net/sched: act_ct: clear post_ct if doing ct_clear
  net: dsa: don't assign an error value to tag_ops
  isdn: capi: fix mismatched prototypes
  net/mlx5: SF, do not use ecpu bit for vhca state processing
  net/mlx5e: Fix division by 0 in mlx5e_select_queue
  net/mlx5e: Fix error path for ethtool set-priv-flag
  net/mlx5e: Offload tuple rewrite for non-CT flows
  net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
  net/mlx5: Add back multicast stats for uplink representor
  net: ipconfig: ic_dev can be NULL in ic_close_devs
  MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
  docs: networking: Fix a typo
  r8169: fix DMA being used after buffer free if WoL is enabled
  net: ipa: fix init header command validation
  ...
2021-03-24 18:16:04 -07:00
Lyude Paul
d3999c1f7b drm/nouveau/kms/nve4-nv108: Limit cursors to 128x128
While Kepler does technically support 256x256 cursors, it turns out that
Kepler actually has some additional requirements for scanout surfaces that
we're not enforcing correctly, which aren't present on Maxwell and later.
Cursor surfaces must always use small pages (4K), and overlay surfaces must
always use large pages (128K).

Fixing this correctly though will take a bit more work: as we'll need to
add some code in prepare_fb() to move cursor FBs in large pages to small
pages, and vice-versa for overlay FBs. So until we have the time to do
that, just limit cursor surfaces to 128x128 - a size small enough to always
default to small pages.

This means small ovlys are still broken on Kepler, but it is extremely
unlikely anyone cares about those anyway :).

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: d3b2f0f792 ("drm/nouveau/kms/nv50-: Report max cursor size to userspace")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2021-03-25 10:00:04 +10:00
Ido Schimmel
e43accba9b psample: Fix user API breakage
Cited commit added a new attribute before the existing group reference
count attribute, thereby changing its value and breaking existing
applications on new kernels.

Before:

 # psample -l
 libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported

After:

 # psample -l
 Group Num       Refcount        Group Seq
 1               1               0

Fix by restoring the value of the old attribute and remove the
misleading comments from the enumerator to avoid future bugs.

Cc: stable@vger.kernel.org
Fixes: d8bed686ab ("net: psample: Add tunnel support")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Adiel Bidani <adielb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:44:31 -07:00
David S. Miller
bf45947864 math: Export mul_u64_u64_div_u64
Fixes: f51d7bf1db ("ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation")
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:42:54 -07:00
Roger Pau Monne
af44a387e7 Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
This partially reverts commit 882213990d ("xen: fix p2m size in dom0
for disabled memory hotplug case")

There's no need to special case XEN_UNPOPULATED_ALLOC anymore in order
to correctly size the p2m. The generic memory hotplug option has
already been tied together with the Xen hotplug limit, so enabling
memory hotplug should already trigger a properly sized p2m on Xen PV.

Note that XEN_UNPOPULATED_ALLOC depends on ZONE_DEVICE which pulls in
MEMORY_HOTPLUG.

Leave the check added to __set_phys_to_machine and the adjusted
comment about EXTRA_MEM_RATIO.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210324122424.58685-3-roger.pau@citrix.com

[boris: fixed formatting issues]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-24 18:33:36 -05:00
Roger Pau Monne
2b514ec727 xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
The Xen memory hotplug limit should depend on the memory hotplug
generic option, rather than the Xen balloon configuration. It's
possible to have a kernel with generic memory hotplug enabled, but
without Xen balloon enabled, at which point memory hotplug won't work
correctly due to the size limitation of the p2m.

Rename the option to XEN_MEMORY_HOTPLUG_LIMIT since it's no longer
tied to ballooning.

Fixes: 9e2369c06c ("xen: add helpers to allocate unpopulated memory")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210324122424.58685-2-roger.pau@citrix.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-24 18:33:11 -05:00
Arnd Bergmann
6f235a69e5 ch_ktls: fix enum-conversion warning
gcc points out an incorrect enum assignment:

drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c: In function 'chcr_ktls_cpl_set_tcb_rpl':
drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:684:22: warning: implicit conversion from 'enum <anonymous>' to 'enum ch_ktls_open_state' [-Wenum-conversion]

This appears harmless, and should apparently use 'CH_KTLS_OPEN_SUCCESS'
instead of 'false', with the same value '0'.

Fixes: efca3878a5 ("ch_ktls: Issue if connection offload fails")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:35:39 -07:00
Colin Ian King
9e0a537d06 octeontx2-af: Fix memory leak of object buf
Currently the error return path when lfs fails to allocate is not free'ing
the memory allocated to buf. Fix this by adding the missing kfree.

Addresses-Coverity: ("Resource leak")
Fixes: f788409714 ("octeontx2-af: Formatting debugfs entry rsrc_alloc.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:33:06 -07:00
Yangbo Lu
f51d7bf1db ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
Current calculation for diff of TMR_ADD register value may have
64-bit overflow in this code line, when long type scaled_ppm is
large.

adj *= scaled_ppm;

This patch is to resolve it by using mul_u64_u64_div_u64().

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:10:03 -07:00
Linus Torvalds
4ee998b0ef Three fixes for the Qualcomm clk driver, two for regressions this merge
window and one for a long standing problem that only popped up now that
 eMMC is being used.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmBbg6ARHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSUSnBAAz9crwEFRI5iZu+yubSStfNCNdXbH6eev
 UEMfi0G21EodS5D5qG2YcPmT4gDkpdkMGO/UosJWrTeFA09dImmmj0TeQ8S2KwHH
 GcOfoWCnMkC/qg/v8aSLVtbj6IORup/fq+oMyd9LdNRcNXg5DZrifzoJWcCXpXMX
 Q1dLYj1aL/JeLh842HxUH0YQI7CxlO/R2hLhYmCjO/ZFHDWpBUbjefv79P40ykV/
 jjCrU1roNPJipmS40puYbyMvPQTaGcXKAKq9n+fdBzuFUP5Sp4/bNPgA3rGO6ABw
 bSenFTfEuvEvSLds6oczSZk/hRhpBmcd865ryLG9ZiAerDX9cb21us0kIkvI6hwZ
 ywLzqRbWDPBrxXHZuUzoLbu4yIqY5wGCqpLmxH5CYoGcit7edlkdnaJPTCXBIen7
 +whoapOFGf5Mgh6hi7zKR9m53GtKTUt5MScVx3nk/iBmQ+OPKQ+DnukhYXXXggEj
 E7XzF8RWqEMMHd//V39RSAAJqNCS7K1t8XKpr0wYc1FP8YsPoiHP/tMNFnqoeptY
 hBQunoVkrDLIyKm/bL3VWFUJaOqEZajkrTvG9jKry+mzIVFjCboNFwDMZ5srEWuu
 XzqdoVvQEjOh1arLdK2KY2Y9xGPQAM/nrIMY8h/6CLHB10tniEP+Dl5y1r1yxtn3
 SJTAjGvN8GQ=
 =e9G9
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Three fixes for the Qualcomm clk driver: two for regressions this
  merge window and one for a long-standing problem that only popped up
  now that eMMC is being used"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: gcc-sc7180: Use floor ops for the correct sdcc1 clk
  clk: qcom: rcg2: Rectify clk_gfx3d rate rounding without mux division
  clk: qcom: rpmh: Update the XO clock source for SC7280
2021-03-24 11:26:50 -07:00
Linus Torvalds
a0a4df6a9e platform-drivers-x86 for v5.12-2
Summary:
  - dell-wmi-sysman: A set of probe-error-exit-handling fixes to fix some systems
    which advertise the WMI GUIDs, but are not compatible, not booting
  - intel-vbtn/intel-hid: Misc. bugfixes
  - intel_pmc: Bug-fixes + a quirk to lower suspend power-consumption on Tiger Lake
  - thinkpad_acpi: Misc. bugfixes
 
 The following is an automated git shortlog grouped by driver:
 
 dell-wmi-sysman:
  -  Cleanup create_attributes_level_sysfs_files()
  -  Make sysman_init() return -ENODEV of the interfaces are not found
  -  Cleanup sysman_init() error-exit handling
  -  Fix release_attributes_data() getting called twice on init_bios_attributes() failure
  -  Make it safe to call exit_foo_attributes() multiple times
  -  Fix possible NULL pointer deref on exit
  -  Fix crash caused by calling kset_unregister twice
 
 intel-hid:
  -  Support Lenovo ThinkPad X1 Tablet Gen 2
 
 intel-vbtn:
  -  Stop reporting SW_DOCK events
 
 intel_pmc_core:
  -  Ignore GBE LTR on Tiger Lake platforms
  -  Update Kconfig
 
 intel_pmt_class:
  -  Initial resource to 0
 
 intel_pmt_crashlog:
  -  Fix incorrect macros
 
 thinkpad_acpi:
  -  Disable DYTC CQL mode around switching to balanced mode
  -  Allow the FnLock LED to change state
  -  check dytc version for lapmode sysfs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmBbbL0UHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yYEgf/dWwTip21gYoi02mdsHsPduaL0Mtu
 grcZpnRSEuWvgl5P26zttmLjAK4rTyySePVBWsxDwH/4qBqY2DCicSQfeQke2/9c
 PJU3i8zXTQxlBUWFrjM8vqFKdTypFXJwpdoBGQD3JJAh8LcSQj5xkhhDQVJYIXLQ
 HIxVM44gPLZc/lHOFGUEtREc2/k2/A09pER6udvVGxSy/Vz1w646G3u9f5edi1jz
 jX5HIlEtEYpZ55E8bQSUcMIVpiv6HLAu5qQXQ+1xeQXXwM7mM6gRpG8Qr9Cy70Aq
 us0AA5AjYd4IudlgFtUQ7NOB5YYEs2WHiFx4+ck0DSE7CMzcamnUNNp7Tg==
 =ybep
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Hans de Goede:
 "A set of bug-fixes and some model specific quirks.

  Summary:

   - dell-wmi-sysman: A set of probe-error-exit-handling fixes to fix
     some systems which advertise the WMI GUIDs, but are not compatible,
     not booting

   - intel-vbtn/intel-hid: Misc. bugfixes

   - intel_pmc: Bug-fixes + a quirk to lower suspend power-consumption
     on Tiger Lake

   - thinkpad_acpi: misc bugfixes"

* tag 'platform-drivers-x86-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: intel_pmc_core: Ignore GBE LTR on Tiger Lake platforms
  platform/x86: intel_pmc_core: Update Kconfig
  platform/x86: intel_pmt_crashlog: Fix incorrect macros
  platform/x86: intel_pmt_class: Initial resource to 0
  platform/x86: intel-vbtn: Stop reporting SW_DOCK events
  platform/x86: dell-wmi-sysman: Cleanup create_attributes_level_sysfs_files()
  platform/x86: dell-wmi-sysman: Make sysman_init() return -ENODEV of the interfaces are not found
  platform/x86: dell-wmi-sysman: Cleanup sysman_init() error-exit handling
  platform/x86: dell-wmi-sysman: Fix release_attributes_data() getting called twice on init_bios_attributes() failure
  platform/x86: dell-wmi-sysman: Make it safe to call exit_foo_attributes() multiple times
  platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
  platform/x86: dell-wmi-sysman: Fix crash caused by calling kset_unregister twice
  platform/x86: thinkpad_acpi: Disable DYTC CQL mode around switching to balanced mode
  platform/x86: thinkpad_acpi: Allow the FnLock LED to change state
  platform/x86: thinkpad_acpi: check dytc version for lapmode sysfs
  platform/x86: intel-hid: Support Lenovo ThinkPad X1 Tablet Gen 2
2021-03-24 11:21:01 -07:00
Johannes Thumshirn
7de55b7d6f block: support zone append bvecs
Christoph reported that we'll likely trigger the WARN_ON_ONCE() checking
that we're not submitting a bvec with REQ_OP_ZONE_APPEND in
bio_iov_iter_get_pages() some time ago using zoned btrfs, but I couldn't
reproduce it back then.

Now Naohiro was able to trigger the bug as well with xfstests generic/095
on a zoned btrfs.

There is nothing that prevents bvec submissions via REQ_OP_ZONE_APPEND if
the hardware's zone append limit is met.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/10bd414d9326c90cd69029077db63b363854eee5.1616600835.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-24 11:36:51 -06:00
Linus Torvalds
8a9d2e133e cachefiles, afs: mm wait fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmBaVsAACgkQ+7dXa6fL
 C2u/7w/8DU9UZN3IRgZzR47xw3qYlgNMWRoiJ2RwSHYDJcsFqziJ/6jN/MDr7vzc
 eo1XQnDUH1Ok02WNxI6iVIfkX6cC/SidCWs6mNevQ6ksn9ei8tG0ZUWLcUl1IA+O
 HzXxvouyL9aJB+aNTQXttoi8JaSuoW/HBV3MbjOLywsy41AicCpt0gI0AJgXHKe8
 nEz3mqWZpCywRTkVkt9sWFOMX2shUzy8SoFgLMNpDUgyMD4r98XVJdIH8X4Em3zE
 syLg92aOnxxTEOAAYefcOSsgDBIkxLqW6F/K884cTPgLC24RJ/LO+M4GoOWX1Cmj
 Gqy9DZ3TGTu9yXr6Cm32OMl6t1Y0rYnktNl1Z4OT0XibK4gxgohZEr811A1/pHHu
 OfPBIUAotKRS4o/scs8Au0+XMT0/R7qfsGZe+TUGzWG1CRzf+tOLMrgXPxWnh2fV
 E2eNfOzy2Ry5v0XB4Lb4tb0JVPM2WOBTbswgUIHUOLz7fT6+mVaFYK/8eDDu6EJH
 zmDxs7HLZvI6X6XB2DOCDDWJbzKk9Jo27raGV5o6QCwAKENIr8XAvgZBEg5+Quvc
 feNBNSWTplgB5ROPlRWgmy/Xh4Y4+uRMCzMN+q9FtC810bDCE5rY5TRnayxmx9ni
 XugpJnoMBM8QcbtHNxropGOg+gQpABYfSfZMmcNPd+Oyix3SbtQ=
 =/IaF
 -----END PGP SIGNATURE-----

Merge tag 'afs-cachefiles-fixes-20210323' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull cachefiles and afs fixes from David Howells:
 "Fixes from Matthew Wilcox for page waiting-related issues in
  cachefiles and afs as extracted from his folio series[1]:

   - In cachefiles, remove the use of the wait_bit_key struct to access
     something that's actually in wait_page_key format. The proper
     struct is now available in the header, so that should be used
     instead.

   - Add a proper wait function for waiting killably on the page
     writeback flag. This includes a recent bugfix[2] that's not in the
     afs code.

   - In afs, use the function added in (2) rather than using
     wait_on_page_bit_killable() which doesn't provide the
     aforementioned bugfix"

Link: https://lore.kernel.org/r/20210320054104.1300774-1-willy@infradead.org[1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [2]
Link: https://lore.kernel.org/r/20210323120829.GC1719932@casper.infradead.org/ # v1

* tag 'afs-cachefiles-fixes-20210323' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Use wait_on_page_writeback_killable
  mm/writeback: Add wait_on_page_writeback_killable
  fs/cachefiles: Remove wait_bit_key layout dependency
2021-03-24 10:22:00 -07:00
Christian Brauner
bf1c82a538 cachefiles: do not yet allow on idmapped mounts
Based on discussions (e.g. in [1]) my understanding of cachefiles and
the cachefiles userspace daemon is that it creates a cache on a local
filesystem (e.g. ext4, xfs etc.) for a network filesystem. The way this
is done is by writing "bind" to /dev/cachefiles and pointing it to the
directory to use as the cache.

Currently this directory can technically also be an idmapped mount but
cachefiles aren't yet fully aware of such mounts and thus don't take the
idmapping into account when creating cache entries. This could leave
users confused as the ownership of the files wouldn't match to what they
expressed in the idmapping. Block cache files on idmapped mounts until
the fscache rework is done and we have ported it to support idmapped
mounts.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/lkml/20210303161528.n3jzg66ou2wa43qb@wittgenstein [1]
Link: https://lore.kernel.org/r/20210316112257.2974212-1-christian.brauner@ubuntu.com/ # v1
Link: https://listman.redhat.com/archives/linux-cachefs/2021-March/msg00044.html # v2
Link: https://lore.kernel.org/r/20210319114146.410329-1-christian.brauner@ubuntu.com/ # v3
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-24 10:20:22 -07:00
Vegard Nossum
25928deeb1 ACPICA: Always create namespace nodes using acpi_ns_create_node()
ACPICA commit 29da9a2a3f5b2c60420893e5c6309a0586d7a329

ACPI is allocating an object using kmalloc(), but then frees it
using kmem_cache_free(<"Acpi-Namespace" kmem_cache>).

This is wrong and can lead to boot failures manifesting like this:

    hpet0: 3 comparators, 64-bit 100.000000 MHz counter
    clocksource: Switched to clocksource tsc-early
    BUG: unable to handle page fault for address: 000000003ffe0018
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] SMP PTI
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0+ #211
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
    RIP: 0010:kmem_cache_alloc+0x70/0x1d0
    Code: 00 00 4c 8b 45 00 65 49 8b 50 08 65 4c 03 05 6f cc e7 7e 4d 8b
20 4d 85 e4 0f 84 3d 01 00 00 8b 45 20 48 8b 7d 00 48 8d 4a 01 <49> 8b
   1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 c5 8b 45 20
    RSP: 0000:ffffc90000013df8 EFLAGS: 00010206
    RAX: 0000000000000018 RBX: ffffffff81c49200 RCX: 0000000000000002
    RDX: 0000000000000001 RSI: 0000000000000dc0 RDI: 000000000002b300
    RBP: ffff88803e403d00 R08: ffff88803ec2b300 R09: 0000000000000001
    R10: 0000000000000dc0 R11: 0000000000000006 R12: 000000003ffe0000
    R13: ffffffff8110a583 R14: 0000000000000dc0 R15: ffffffff81c49a80
    FS:  0000000000000000(0000) GS:ffff88803ec00000(0000)
knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000003ffe0018 CR3: 0000000001c0a001 CR4: 00000000003606f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     __trace_define_field+0x33/0xa0
     event_trace_init+0xeb/0x2b4
     tracer_init_tracefs+0x60/0x195
     ? register_tracer+0x1e7/0x1e7
     do_one_initcall+0x74/0x160
     kernel_init_freeable+0x190/0x1f0
     ? rest_init+0x9a/0x9a
     kernel_init+0x5/0xf6
     ret_from_fork+0x35/0x40
    CR2: 000000003ffe0018
    ---[ end trace 707efa023f2ee960 ]---
    RIP: 0010:kmem_cache_alloc+0x70/0x1d0

Bisection leads to unrelated changes in slab; Vlastimil Babka
suggests an unrelated layout or slab merge change merely exposed
the underlying bug.

Link: https://lore.kernel.org/lkml/4dc93ff8-f86e-f4c9-ebeb-6d3153a78d03@oracle.com/
Link: https://lore.kernel.org/r/a1461e21-c744-767d-6dfc-6641fd3e3ce2@siemens.com
Link: https://github.com/acpica/acpica/commit/29da9a2a
Fixes: f79c8e4136 ("ACPICA: Namespace: simplify creation of the initial/default namespace")
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Diagnosed-by: Vlastimil Babka <vbabka@suse.cz>
Diagnosed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Kaneda <erik.kaneda@intel.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-24 14:50:54 +01:00
Namhyung Kim
41d5854113 perf record: Fix memory leak in vDSO found using ASAN
I got several memory leak reports from Asan with a simple command.  It
was because VDSO is not released due to the refcount.  Like in
__dsos_addnew_id(), it should put the refcount after adding to the list.

  $ perf record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]

  =================================================================
  ==692599==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
    #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
    #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
    #4 0x559bce50826c in map__new util/map.c:175
    #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #11 0x559bce519bea in perf_session__process_events util/session.c:2297
    #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
    #2 0x559bce50821b in map__new util/map.c:168
    #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #9 0x559bce519bea in perf_session__process_events util/session.c:2297
    #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-24 10:38:56 -03:00
Thomas Richter
eb8f998bbc perf test: Remove now useless failing sub test "BPF relocation checker"
For some time now the 'perf test 42: BPF filter' returns an error on bpf
relocation subtest, at least on x86 and s390. This is caused by

  d859900c4c ("bpf, libbpf: support global data/bss/rodata sections")

which introduces support for global variables in eBPF programs.

Perf test 42.4 checks that the eBPF relocation fails when the eBPF program
contains a global variable. It returns OK when the eBPF program
could not be loaded and FAILED otherwise.

With above commit the test logic for the eBPF relocation is obsolete.
The loading of the eBPF now succeeds and the test always shows FAILED.

This patch removes the sub test completely.
Also a lot of eBPF program testing is done in the eBPF test suite,
it also contains tests for global variables.

Output before:
 42: BPF filter                          :
 42.1: Basic BPF filtering               : Ok
 42.2: BPF pinning                       : Ok
 42.3: BPF prologue generation           : Ok
 42.4: BPF relocation checker            : Failed
 #

Output after:
 # ./perf test -F 42
 42: BPF filter                          :
 42.1: Basic BPF filtering               : Ok
 42.2: BPF pinning                       : Ok
 42.3: BPF prologue generation           : Ok
 #

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210324083734.1953123-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-24 10:33:03 -03:00
Jiri Olsa
9f177fd8f2 perf daemon: Return from kill functions
We should return correctly and warn in both daemon_session__kill() and
daemon__kill() after we tried everything to kill sessions.  The current
code will keep on looping and waiting.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210320221013.1619613-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-24 10:24:00 -03:00
Jiri Olsa
1833b64fee perf daemon: Force waipid for all session on SIGCHLD delivery
If we don't process SIGCHLD before another comes, we will see just one
SIGCHLD as a result. In this case current code will miss exit
notification for a session and wait forever.

Adding extra waitpid check for all sessions when SIGCHLD is received, to
make sure we don't miss any session exit.

Also fix close condition for signal_fd.

Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210320221013.1619613-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-24 10:23:15 -03:00
Imre Deak
8840e3bd98 drm/i915: Fix the GT fence revocation runtime PM logic
To optimize some task deferring it until runtime resume unless someone
holds a runtime PM reference (because in this case the task can be done
w/o the overhead of runtime resume), we have to use the runtime PM
get-if-active logic: If the runtime PM usage count is 0 (and so
get-if-in-use would return false) the runtime suspend handler is not
necessarily called yet (it could be just pending), so the device is not
necessarily powered down, and so the runtime resume handler is not
guaranteed to be called.

The fence revocation depends on the above deferral, so add a
get-if-active helper and use it during fence revocation.

v2:
- Add code comment explaining the fence reg programming deferral logic
  to i915_vma_revoke_fence(). (Chris)
- Add Cc: stable and Fixes: tags. (Chris)
- Fix the function docbook comment.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.12+
Fixes: 181df2d458 ("drm/i915: Take rpm wakelock for releasing the fence on unbind")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210322204223.919936-1-imre.deak@intel.com
(cherry picked from commit 9d58aa4629)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-03-24 09:12:07 -04:00
Pavel Begunkov
a185f1db59 io_uring: do ctx sqd ejection in a clear context
WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
Call Trace:
 io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
 io_uring_release+0x3e/0x50 fs/io_uring.c:8646
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 io_run_task_work fs/io_uring.c:2238 [inline]
 io_run_task_work fs/io_uring.c:2228 [inline]
 io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
fput callback (i.e. io_uring_release()) is enqueued through task_work,
and run by same cancellation. As it's deeply nested we can't do parking
or taking sqd->lock there, because its state is unclear. So avoid
ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
in a clear context in io_ring_exit_work().

Fixes: f6d54255f4 ("io_uring: halt SQO submission on ctx exit")
Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-24 06:55:11 -06:00
Alex Deucher
5c458585c0 drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
Commit 098214999c added fetching of the AUX_DPHY register
values from the vbios, but it also changed the default values
in the case when there are no values in the vbios.  This causes
problems with displays with high refresh rates.  To fix this,
switch back to the original default value for AUX_DPHY_TX_CONTROL.

Fixes: 098214999c ("drm/amd/display: Read VBIOS Golden Settings Tbl")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1426
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Igor Kravchenko <Igor.Kravchenko@amd.com>
Cc: Aric Cyr <Aric.Cyr@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: stable@vger.kernel.org
2021-03-24 00:30:57 -04:00
Alex Deucher
c933b11109 drm/amdgpu: Add additional Sienna Cichlid PCI ID
Add new DID.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-03-24 00:29:37 -04:00
Vladimir Oltean
6ab4c3117a net: bridge: don't notify switchdev for local FDB addresses
As explained in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/

the switchdev notifiers for FDB entries managed to have a zero-day bug.
The bridge would not say that this entry is local:

ip link add br0 type bridge
ip link set swp0 master br0
bridge fdb add dev swp0 00:01:02:03:04:05 master local

and the switchdev driver would be more than happy to offload it as a
normal static FDB entry. This is despite the fact that 'local' and
non-'local' entries have completely opposite directions: a local entry
is locally terminated and not forwarded, whereas a static entry is
forwarded and not locally terminated. So, for example, DSA would install
this entry on swp0 instead of installing it on the CPU port as it should.

There is an even sadder part, which is that the 'local' flag is implicit
if 'static' is not specified, meaning that this command produces the
same result of adding a 'local' entry:

bridge fdb add dev swp0 00:01:02:03:04:05 master

I've updated the man pages for 'bridge', and after reading it now, it
should be pretty clear to any user that the commands above were broken
and should have never resulted in the 00:01:02:03:04:05 address being
forwarded (this behavior is coherent with non-switchdev interfaces):
https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/
If you're a user reading this and this is what you want, just use:

bridge fdb add dev swp0 00:01:02:03:04:05 master static

Because switchdev should have given drivers the means from day one to
classify FDB entries as local/non-local, but didn't, it means that all
drivers are currently broken. So we can just as well omit the switchdev
notifications for local FDB entries, which is exactly what this patch
does to close the bug in stable trees. For further development work
where drivers might want to trap the local FDB entries to the host, we
can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and
selectively make drivers act upon that bit, while all the others ignore
those entries if the 'is_local' bit is set.

Fixes: 6b26b51b1d ("net: bridge: Add support for notifying devices about FDB add/del")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 14:39:41 -07:00
Marcelo Ricardo Leitner
8ca1b090e5 net/sched: act_ct: clear post_ct if doing ct_clear
Invalid detection works with two distinct moments: act_ct tries to find
a conntrack entry and set post_ct true, indicating that that was
attempted. Then, when flow dissector tries to dissect CT info and no
entry is there, it knows that it was tried and no entry was found, and
synthesizes/sets
                  key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
                                  TCA_FLOWER_KEY_CT_FLAGS_INVALID;
mimicing what OVS does.

OVS has this a bit more streamlined, as it recomputes the key after
trying to find a conntrack entry for it.

Issue here is, when we have 'tc action ct clear', it didn't clear
post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to
the above. The fix, thus, is to clear it.

Reproducer rules:
tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \
	protocol ip flower ip_proto tcp ct_state -trk \
	action ct zone 1 pipe \
	action goto chain 2
tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \
	protocol ip flower \
	action ct clear pipe \
	action goto chain 4
tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \
	protocol ip flower ct_state -trk \
	action mirred egress redirect dev enp130s0f1np1_0

With the fix, the 3rd rule matches, like it does with OVS kernel
datapath.

Fixes: 7baf2429a1 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 14:32:26 -07:00
Matthew Wilcox (Oracle)
75b6979961 afs: Use wait_on_page_writeback_killable
Open-coding this function meant it missed out on the recent bugfix
for waiters being woken by a delayed wake event from a previous
instantiation of the page[1].

[DH: Changed the patch to use vmf->page rather than variable page which
 doesn't exist yet upstream]

Fixes: 1cf7a1518a ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-afs@lists.infradead.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-4-willy@infradead.org
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [1]
2021-03-23 20:54:37 +00:00
Matthew Wilcox (Oracle)
e5dbd33218 mm/writeback: Add wait_on_page_writeback_killable
This is the killable version of wait_on_page_writeback.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-afs@lists.infradead.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-3-willy@infradead.org
2021-03-23 20:54:29 +00:00
Matthew Wilcox (Oracle)
39f985c8f6 fs/cachefiles: Remove wait_bit_key layout dependency
Cachefiles was relying on wait_page_key and wait_bit_key being the
same layout, which is fragile.  Now that wait_page_key is exposed in
the pagemap.h header, we can remove that fragility

A comment on the need to maintain structure layout equivalence was added by
Linus[1] and that is no longer applicable.

Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-cachefs@redhat.com
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-2-willy@infradead.org/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2 [1]
2021-03-23 20:54:29 +00:00
David E. Box
d1635448f1 platform/x86: intel_pmc_core: Ignore GBE LTR on Tiger Lake platforms
Due to a HW limitation, the Latency Tolerance Reporting (LTR) value
programmed in the Tiger Lake GBE controller is not large enough to allow
the platform to enter Package C10, which in turn prevents the platform from
achieving its low power target during suspend-to-idle.  Ignore the GBE LTR
value on Tiger Lake. LTR ignore functionality is currently performed solely
by a debugfs write call. Split out the LTR code into its own function that
can be called by both the debugfs writer and by this work around.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Sasha Neftin <sasha.neftin@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com>
Link: https://lore.kernel.org/r/20210319201844.3305399-2-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-03-23 21:50:14 +01:00
David E. Box
269b04a509 platform/x86: intel_pmc_core: Update Kconfig
The intel_pmc_core driver is mostly used as a debugging driver for Intel
platforms that support SLPS0 (S0ix). But the driver may also be used to
communicate actions to the PMC in order to ensure transition to SLPS0 on
some systems and architectures. As such the driver should be built on all
platforms it supports. Indicate this in the Kconfig. Also update the list
of supported features.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Suggested-by: Mario Limonciello <mario.limonciello@dell.com>
Link: https://lore.kernel.org/r/20210319201844.3305399-1-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-03-23 21:50:08 +01:00
David E. Box
10c931cdfe platform/x86: intel_pmt_crashlog: Fix incorrect macros
Fixes off-by-one bugs in the macro assignments for the crashlog control
bits. Was initially tested on emulation but bug revealed after testing on
silicon.

Fixes: 5ef9998c96 ("platform/x86: Intel PMT Crashlog capability driver")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20210317024455.3071477-2-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-03-23 21:50:02 +01:00
David E. Box
7547deff8a platform/x86: intel_pmt_class: Initial resource to 0
Initialize the struct resource in intel_pmt_dev_register to zero to avoid a
fault should the char *name field be non-zero.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20210317024455.3071477-1-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-03-23 21:49:56 +01:00
Lukasz Luba
fb9d62b27a PM: EM: postpone creating the debugfs dir till fs_initcall
The debugfs directory '/sys/kernel/debug/energy_model' is needed before
the Energy Model registration can happen. With the recent change in
debugfs subsystem it's not allowed to create this directory at early
stage (core_initcall). Thus creating this directory would fail.

Postpone the creation of the EM debug dir to later stage: fs_initcall.

It should be safe since all clients: CPUFreq drivers, Devfreq drivers
will be initialized in later stages.

The custom debug log below prints the time of creation the EM debug dir
at fs_initcall and successful registration of EMs at later stages.

[    1.505717] energy_model: creating rootdir
[    3.698307] cpu cpu0: EM: created perf domain
[    3.709022] cpu cpu1: EM: created perf domain

Fixes: 56348560d4 ("debugfs: do not attempt to create a new file before the filesystem is initalized")
Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-23 19:53:48 +01:00
Linus Torvalds
7acac4b319 linux-kselftest-kunit-fixes-5.12-rc5.1
This KUnit update for Linux 5.12-rc5 consists of two fixes to kunit
 tool from David Gow.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmBaFeoACgkQCwJExA0N
 Qxz0Gg//Y3mX/SmBWYCB32Vm3nXOIqEPxDuTxn4V6l1XoSOjViDlqPjVMIeQ1/Cv
 R8Hx0rXjzU9e9tuQaHSCnp89vqjnhIB3ROPWABsL2784EDsCpIqlT2ivey9wwBva
 VImaTDFn4xlJjJG1L/vmlBzagczk02qSsvadbwC/+vLQSfSZJTZWD9c6TpmfF82T
 P+FbWs8+WgpilsPOCYqXRvH8HDSWR/oXFOFxJTg64N3c5Vj2JIgnBeaCs+yahvEA
 KnAt8988uhHH15ROsa93g6dIvQ4C7NT74IKZW8VLvkkUxAk86clQYRcZZdCBbZhc
 yL3X1RhP/8a452df2LO0xraZTaxc2XI8H1F2ZhsJmmeBL2r2dyEe/4mWQD944e0h
 ZGQqIi4+WiMV/+WyTwp05KxvPBaiG0GFFLznNEPQQLuP9Iwh8QBXnBpg+pw58pQ1
 rk0fi8H0IaDjQjbAKiILnxRuINMAhoMOaYuYpMy+mD60AwyPOCAv1dHRInS8SG9D
 UZxTDFyOQDJqSmLueqULbaWQ/21Jg8Jui4Mowf6DpXSXdPpJt4wHffBnWVW1OHqC
 pYTCFAZ+S6bSoEeTvnyHXKO9MtsS0kxS1O5V5LS+Sc0sUcf3sxr83gU9E7Jvnmum
 Os4HyFr5H4AAS4Yif2vgmB7Y2kIIgWViIcKzcGfXha8im8vcooM=
 =uSNy
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-fixes-5.12-rc5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit fixes from Shuah Khan:
 "Two fixes to the kunit tool from David Gow"

* tag 'linux-kselftest-kunit-fixes-5.12-rc5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: tool: Disable PAGE_POISONING under --alltests
  kunit: tool: Fix a python tuple typing error
2021-03-23 10:18:08 -07:00
David Jeffery
a958937ff1 block: recalculate segment count for multi-segment discards correctly
When a stacked block device inserts a request into another block device
using blk_insert_cloned_request, the request's nr_phys_segments field gets
recalculated by a call to blk_recalc_rq_segments in
blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to
handle multi-segment discards. For disk types which can handle
multi-segment discards like nvme, this results in discard requests which
claim a single segment when it should report several, triggering a warning
in nvme and causing nvme to fail the discard from the invalid state.

 WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core]
 ...
 nvme_setup_cmd+0x217/0x270 [nvme_core]
 nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop]
 __blk_mq_try_issue_directly+0xe7/0x1b0
 blk_mq_request_issue_directly+0x41/0x70
 ? blk_account_io_start+0x40/0x50
 dm_mq_queue_rq+0x200/0x3e0
 blk_mq_dispatch_rq_list+0x10a/0x7d0
 ? __sbitmap_queue_get+0x25/0x90
 ? elv_rb_del+0x1f/0x30
 ? deadline_remove_request+0x55/0xb0
 ? dd_dispatch_request+0x181/0x210
 __blk_mq_do_dispatch_sched+0x144/0x290
 ? bio_attempt_discard_merge+0x134/0x1f0
 __blk_mq_sched_dispatch_requests+0x129/0x180
 blk_mq_sched_dispatch_requests+0x30/0x60
 __blk_mq_run_hw_queue+0x47/0xe0
 __blk_mq_delay_run_hw_queue+0x15b/0x170
 blk_mq_sched_insert_requests+0x68/0xe0
 blk_mq_flush_plug_list+0xf0/0x170
 blk_finish_plug+0x36/0x50
 xlog_cil_committed+0x19f/0x290 [xfs]
 xlog_cil_process_committed+0x57/0x80 [xfs]
 xlog_state_do_callback+0x1e0/0x2a0 [xfs]
 xlog_ioend_work+0x2f/0x80 [xfs]
 process_one_work+0x1b6/0x350
 worker_thread+0x53/0x3e0
 ? process_one_work+0x350/0x350
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

This patch fixes blk_recalc_rq_segments to be aware of devices which can
have multi-segment discards. It calculates the correct discard segment
count by counting the number of bio as each discard bio is considered its
own segment.

Fixes: 1e739730c5 ("block: optionally merge discontiguous discard bios into a single request")
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Link: https://lore.kernel.org/r/20210211143807.GA115624@redhat
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-23 10:39:57 -06:00