Commit graph

8659 commits

Author SHA1 Message Date
Linus Torvalds
f70efe54b9 work around gcc bugs with 'asm goto' with outputs
commit 68fb3ca0e4 upstream.

We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

The same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23 09:12:28 +01:00
Michael Ellerman
7ad4b2a6b2 powerpc: Fix build error due to is_valid_bugaddr()
[ Upstream commit f8d3555355 ]

With CONFIG_GENERIC_BUG=n the build fails with:

  arch/powerpc/kernel/traps.c:1442:5: error: no previous prototype for ‘is_valid_bugaddr’ [-Werror=missing-prototypes]
  1442 | int is_valid_bugaddr(unsigned long addr)
       |     ^~~~~~~~~~~~~~~~

The prototype is only defined, and the function is only needed, when
CONFIG_GENERIC_BUG=y, so move the implementation under that.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231130114433.3053544-2-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:46 +00:00
Naveen N Rao
1c077acf24 powerpc/ftrace: Fix stack teardown in ftrace_no_trace
[ Upstream commit 4b3338aaa7 ]

Commit 41a506ef71 ("powerpc/ftrace: Create a dummy stackframe to fix
stack unwind") added use of a new stack frame on ftrace entry to fix
stack unwind. However, the commit missed updating the offset used while
tearing down the ftrace stack when ftrace is disabled. Fix the same.

In addition, the commit missed saving the correct stack pointer in
pt_regs. Update the same.

Fixes: 41a506ef71 ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind")
Cc: stable@vger.kernel.org # v6.5+
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231130065947.2188860-1-naveen@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:00:14 +01:00
Timothy Pearson
c23b9eaca8 powerpc: Don't clobber f0/vs0 during fp|altivec register save
commit 5e1d824f9a upstream.

During floating point and vector save to thread data f0/vs0 are
clobbered by the FPSCR/VSCR store routine. This has been obvserved to
lead to userspace register corruption and application data corruption
with io-uring.

Fix it by restoring f0/vs0 after FPSCR/VSCR store has completed for
all the FP, altivec, VMX register save paths.

Tested under QEMU in kvm mode, running on a Talos II workstation with
dual POWER9 DD2.2 CPUs.

Additional detail (mpe):

Typically save_fpu() is called from __giveup_fpu() which saves the FP
regs and also *turns off FP* in the tasks MSR, meaning the kernel will
reload the FP regs from the thread struct before letting the task use FP
again. So in that case save_fpu() is free to clobber f0 because the FP
regs no longer hold live values for the task.

There is another case though, which is the path via:
  sys_clone()
    ...
    copy_process()
      dup_task_struct()
        arch_dup_task_struct()
          flush_all_to_thread()
            save_all()

That path saves the FP regs but leaves them live. That's meant as an
optimisation for a process that's using FP/VSX and then calls fork(),
leaving the regs live means the parent process doesn't have to take a
fault after the fork to get its FP regs back. The optimisation was added
in commit 8792468da5 ("powerpc: Add the ability to save FPU without
giving it up").

That path does clobber f0, but f0 is volatile across function calls,
and typically programs reach copy_process() from userspace via a syscall
wrapper function. So in normal usage f0 being clobbered across a
syscall doesn't cause visible data corruption.

But there is now a new path, because io-uring can call copy_process()
via create_io_thread() from the signal handling path. That's OK if the
signal is handled as part of syscall return, but it's not OK if the
signal is handled due to some other interrupt.

That path is:

interrupt_return_srr_user()
  interrupt_exit_user_prepare()
    interrupt_exit_user_prepare_main()
      do_notify_resume()
        get_signal()
          task_work_run()
            create_worker_cb()
              create_io_worker()
                copy_process()
                  dup_task_struct()
                    arch_dup_task_struct()
                      flush_all_to_thread()
                        save_all()
                          if (tsk->thread.regs->msr & MSR_FP)
                            save_fpu()
                            # f0 is clobbered and potentially live in userspace

Note the above discussion applies equally to save_altivec().

Fixes: 8792468da5 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Closes: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/
Closes: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/
Tested-by: Timothy Pearson <tpearson@raptorengineering.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
[mpe: Reword change log to describe exact path of corruption & other minor tweaks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1921539696.48534988.1700407082933.JavaMail.zimbra@raptorengineeringinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08 08:51:15 +01:00
Nicholas Piggin
e6bc42fae6 KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers
commit dc158d23b3 upstream.

Before running a guest, the host process (e.g., QEMU) FP/VEC registers
are saved if they were being used, similarly to when the kernel uses FP
registers. The guest values are then loaded into regs, and the host
process registers will be restored lazily when it uses FP/VEC.

KVM HV has a bug here: the host process registers do get saved, but the
user MSR bits remain enabled, which indicates the registers are valid
for the process. After they are clobbered by running the guest, this
valid indication causes the host process to take on the FP/VEC register
values of the guest.

Fixes: 34e119c96b ("KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231122025811.2973-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08 08:51:15 +01:00
Christophe Leroy
9d4f7441cd powerpc: Only define __parse_fpscr() when required
[ Upstream commit c7e0d9bb91 ]

Clang 17 reports:

arch/powerpc/kernel/traps.c:1167:19: error: unused function '__parse_fpscr' [-Werror,-Wunused-function]

__parse_fpscr() is called from two sites. First call is guarded
by #ifdef CONFIG_PPC_FPU_REGS

Second call is guarded by CONFIG_MATH_EMULATION which selects
CONFIG_PPC_FPU_REGS.

So only define __parse_fpscr() when CONFIG_PPC_FPU_REGS is defined.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309210327.WkqSd5Bq-lkp@intel.com/
Fixes: b6254ced4d ("powerpc/signal: Don't manage floating point regs when no FPU")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5de2998c57f3983563b27b39228ea9a7229d4110.1695385984.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20 11:52:12 +01:00
Michael Ellerman
ed0ba37e7b powerpc/mm: Fix boot crash with FLATMEM
[ Upstream commit daa9ada209 ]

Erhard reported that his G5 was crashing with v6.6-rc kernels:

  mpic: Setting up HT PICs workarounds for U3/U4
  BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe
  Faulting instruction address: 0xc00000000005dc40
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Tainted: G                T  6.6.0-rc3-PMacGS #1
  Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
  NIP:  c00000000005dc40 LR: c000000000066660 CTR: c000000000007730
  REGS: c0000000022bf510 TRAP: 0380   Tainted: G                T (6.6.0-rc3-PMacGS)
  MSR:  9000000000001032 <SF,HV,ME,IR,DR,RI>  CR: 44004242  XER: 00000000
  IRQMASK: 3
  GPR00: 0000000000000000 c0000000022bf7b0 c0000000010c0b00 00000000000001ac
  GPR04: 0000000003c80000 0000000000000300 c0000000f20001ae 0000000000000300
  GPR08: 0000000000000006 feffbb62ffec65ff 0000000000000001 0000000000000000
  GPR12: 9000000000001032 c000000002362000 c000000000f76b80 000000000349ecd8
  GPR16: 0000000002367ba8 0000000002367f08 0000000000000006 0000000000000000
  GPR20: 00000000000001ac c000000000f6f920 c0000000022cd985 000000000000000c
  GPR24: 0000000000000300 00000003b0a3691d c0003e008030000e 0000000000000000
  GPR28: c00000000000000c c0000000f20001ee feffbb62ffec65fe 00000000000001ac
  NIP hash_page_do_lazy_icache+0x50/0x100
  LR  __hash_page_4K+0x420/0x590
  Call Trace:
    hash_page_mm+0x364/0x6f0
    do_hash_fault+0x114/0x2b0
    data_access_common_virt+0x198/0x1f0
  --- interrupt: 300 at mpic_init+0x4bc/0x10c4
  NIP:  c000000002020a5c LR: c000000002020a04 CTR: 0000000000000000
  REGS: c0000000022bf9f0 TRAP: 0300   Tainted: G                T (6.6.0-rc3-PMacGS)
  MSR:  9000000000001032 <SF,HV,ME,IR,DR,RI>  CR: 24004248  XER: 00000000
  DAR: c0003e008030000e DSISR: 40000000 IRQMASK: 1
  ...
  NIP mpic_init+0x4bc/0x10c4
  LR  mpic_init+0x464/0x10c4
  --- interrupt: 300
    pmac_setup_one_mpic+0x258/0x2dc
    pmac_pic_init+0x28c/0x3d8
    init_IRQ+0x90/0x140
    start_kernel+0x57c/0x78c
    start_here_common+0x1c/0x20

A bisect pointed to the breakage beginning with commit 9fee28baa6 ("powerpc:
implement the new page table range API").

Analysis of the oops pointed to a struct page with a corrupted
compound_head being loaded via page_folio() -> _compound_head() in
hash_page_do_lazy_icache().

The access by the mpic code is to an MMIO address, so the expectation
is that the struct page for that address would be initialised by
init_unavailable_range(), as pointed out by Aneesh.

Instrumentation showed that was not the case, which eventually lead to
the realisation that pfn_valid() was returning false for that address,
causing the struct page to not be initialised.

Because the system is using FLATMEM, the version of pfn_valid() in
memory_model.h is used:

static inline int pfn_valid(unsigned long pfn)
{
	...
	return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
}

Which relies on max_mapnr being initialised. Early in boot max_mapnr is
zero meaning no PFNs are valid.

max_mapnr is initialised in mem_init() called via:

  start_kernel()
    mm_core_init()  # init/main.c:928
      mem_init()

But that is too late for the usage in init_unavailable_range() called via:

  start_kernel()
    setup_arch()    # init/main.c:893
      paging_init()
        free_area_init()
          init_unavailable_range()

Although max_mapnr is currently set in mem_init(), the value is actually
already available much earlier, as soon as mem_topology_setup() has
completed, which is also before paging_init() is called. So move the
initialisation there, which causes paging_init() to correctly initialise
the struct page and fixes the bug.

This bug seems to have been lurking for years, but went unnoticed
because the pre-folio code was inspecting the uninitialised page->flags
but not dereferencing it.

Thanks to Erhard and Aneesh for help debugging.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/20230929132750.3cd98452@yea/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231023112500.1550208-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-08 14:11:02 +01:00
Christophe Leroy
c64c237275 powerpc/85xx: Fix math emulation exception
[ Upstream commit 8e8a12ecbc ]

Booting mpc85xx_defconfig kernel on QEMU leads to:

Bad trap at PC: fe9bab0, SR: 2d000, vector=800
awk[82]: unhandled trap (5) at 0 nip fe9bab0 lr fe9e01c code 5 in libc-2.27.so[fe5a000+17a000]
awk[82]: code: 3aa00000 3a800010 4bffe03c 9421fff0 7ca62b78 38a00000 93c10008 83c10008
awk[82]: code: 38210010 4bffdec8 9421ffc0 7c0802a6 <fc00048e> d8010008 4815190d 93810030
Trace/breakpoint trap
WARNING: no useful console

This is because allthough CONFIG_MATH_EMULATION is selected,
Exception 800 calls unknown_exception().

Call emulation_assist_interrupt() instead.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/066caa6d9480365da9b8ed83692d7101e10ac5f8.1695657339.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-08 14:10:58 +01:00
Michael Ellerman
8ac2689502 powerpc/47x: Fix 47x syscall return crash
commit f0eee815ba upstream.

Eddie reported that newer kernels were crashing during boot on his 476
FSP2 system:

  kernel tried to execute user page (b7ee2000) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch
  Faulting instruction address: 0xb7ee2000
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=4K FSP-2
  Modules linked in:
  CPU: 0 PID: 61 Comm: mount Not tainted 6.1.55-d23900f.ppcnf-fsp2 #1
  Hardware name: ibm,fsp2 476fpe 0x7ff520c0 FSP-2
  NIP:  b7ee2000 LR: 8c008000 CTR: 00000000
  REGS: bffebd83 TRAP: 0400   Not tainted (6.1.55-d23900f.ppcnf-fs p2)
  MSR:  00000030 <IR,DR>  CR: 00001000  XER: 20000000
  GPR00: c00110ac bffebe63 bffebe7e bffebe88 8c008000 00001000 00000d12 b7ee2000
  GPR08: 00000033 00000000 00000000 c139df10 48224824 1016c314 10160000 00000000
  GPR16: 10160000 10160000 00000008 00000000 10160000 00000000 10160000 1017f5b0
  GPR24: 1017fa50 1017f4f0 1017fa50 1017f740 1017f630 00000000 00000000 1017f4f0
  NIP [b7ee2000] 0xb7ee2000
  LR [8c008000] 0x8c008000
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 0000000000000000 ]---

The problem is in ret_from_syscall where the check for
icache_44x_need_flush is done. When the flush is needed the code jumps
out-of-line to do the flush, and then intends to jump back to continue
the syscall return.

However the branch back to label 1b doesn't return to the correct
location, instead branching back just prior to the return to userspace,
causing bogus register values to be used by the rfi.

The breakage was introduced by commit 6f76a01173
("powerpc/syscall: implement system call entry/exit logic in C for PPC32") which
inadvertently removed the "1" label and reused it elsewhere.

Fix it by adding named local labels in the correct locations. Note that
the return label needs to be outside the ifdef so that CONFIG_PPC_47x=n
compiles.

Fixes: 6f76a01173 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Eddie James <eajames@linux.ibm.com>
Tested-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/fdaadc46-7476-9237-e104-1d2168526e72@linux.ibm.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://msgid.link/20231010114750.847794-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-19 23:08:56 +02:00
Benjamin Gray
c93aa8cfae powerpc/watchpoints: Annotate atomic context in more places
[ Upstream commit 27646b2e02 ]

It can be easy to miss that the notifier mechanism invokes the callbacks
in an atomic context, so add some comments to that effect on the two
handlers we register here.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-4-bgray@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06 14:56:57 +02:00
Benjamin Gray
3632e9fd82 powerpc/watchpoint: Disable pagefaults when getting user instruction
[ Upstream commit 3241f260eb ]

This is called in an atomic context, so is not allowed to sleep if a
user page needs to be faulted in and has nowhere it can be deferred to.
The pagefault_disabled() function is documented as preventing user
access methods from sleeping.

In practice the page will be mapped in nearly always because we are
reading the instruction that just triggered the watchpoint trap.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-3-bgray@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06 14:56:57 +02:00
Benjamin Gray
16722418cb powerpc/watchpoints: Disable preemption in thread_change_pc()
[ Upstream commit cc879ab3ce ]

thread_change_pc() uses CPU local data, so must be protected from
swapping CPUs while it is reading the breakpoint struct.

The error is more noticeable after 1e60f3564b ("powerpc/watchpoints:
Track perf single step directly on the breakpoint"), which added an
unconditional __this_cpu_read() call in thread_change_pc(). However the
existing __this_cpu_read() that runs if a breakpoint does need to be
re-inserted has the same issue.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230829063457.54157-2-bgray@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06 14:56:57 +02:00
Russell Currey
f17d5efaaf powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
[ Upstream commit c37b6908f7 ]

fail_iommu_setup() registers the fail_iommu_bus_notifier struct to both
PCI and VIO buses.  struct notifier_block is a linked list node, so this
causes any notifiers later registered to either bus type to also be
registered to the other since they share the same node.

This causes issues in (at least) the vgaarb code, which registers a
notifier for PCI buses.  pci_notify() ends up being called on a vio
device, converted with to_pci_dev() even though it's not a PCI device,
and finally makes a bad access in vga_arbiter_add_pci_device() as
discovered with KASAN:

 BUG: KASAN: slab-out-of-bounds in vga_arbiter_add_pci_device+0x60/0xe00
 Read of size 4 at addr c000000264c26fdc by task swapper/0/1

 Call Trace:
   dump_stack_lvl+0x1bc/0x2b8 (unreliable)
   print_report+0x3f4/0xc60
   kasan_report+0x244/0x698
   __asan_load4+0xe8/0x250
   vga_arbiter_add_pci_device+0x60/0xe00
   pci_notify+0x88/0x444
   notifier_call_chain+0x104/0x320
   blocking_notifier_call_chain+0xa0/0x140
   device_add+0xac8/0x1d30
   device_register+0x58/0x80
   vio_register_device_node+0x9ac/0xce0
   vio_bus_scan_register_devices+0xc4/0x13c
   __machine_initcall_pseries_vio_device_init+0x94/0xf0
   do_one_initcall+0x12c/0xaa8
   kernel_init_freeable+0xa48/0xba8
   kernel_init+0x64/0x400
   ret_from_kernel_thread+0x5c/0x64

Fix this by creating separate notifier_block structs for each bus type.

Fixes: d6b9a81b2a ("powerpc: IOMMU fault injection")
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Add #ifdef to fix CONFIG_IBMVIO=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230322035322.328709-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:48 +02:00
Sourabh Jain
1d58a92469 powerpc/fadump: reset dump area size if fadump memory reserve fails
[ Upstream commit d1eb75e0df ]

In case fadump_reserve_mem() fails to reserve memory, the
reserve_dump_area_size variable will retain the reserve area size. This
will lead to /sys/kernel/fadump/mem_reserved node displaying an incorrect
memory reserved by fadump.

To fix this problem, reserve dump area size variable is set to 0 if fadump
failed to reserve memory.

Fixes: 8255da95e5 ("powerpc/fadump: release all the memory above boot memory size")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230704050715.203581-1-sourabhjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:47 +02:00
Nathan Lynch
b8fee83aa4 powerpc/rtas_flash: allow user copy to flash block cache objects
commit 4f3175979e upstream.

With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():

  kernel BUG at mm/usercopy.c:102!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
  NIP:  c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
  REGS: c0000000148c76a0 TRAP: 0700   Not tainted  (6.5.0-rc3+)
  MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002242  XER: 0000000c
  CFAR: c0000000001fbd34 IRQMASK: 0
  [ ... GPRs omitted ... ]
  NIP usercopy_abort+0xa0/0xb0
  LR  usercopy_abort+0x9c/0xb0
  Call Trace:
    usercopy_abort+0x9c/0xb0 (unreliable)
    __check_heap_object+0x1b4/0x1d0
    __check_object_size+0x2d0/0x380
    rtas_flash_write+0xe4/0x250
    proc_reg_write+0xfc/0x160
    vfs_write+0xfc/0x4e0
    ksys_write+0x90/0x160
    system_call_exception+0x178/0x320
    system_call_common+0x160/0x2c4

The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.

Fixes: 6d07d1cd30 ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23 17:52:30 +02:00
Naveen N Rao
b8ea2a4691 powerpc/ftrace: Create a dummy stackframe to fix stack unwind
commit 41a506ef71 upstream.

With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to
call into ftrace are emitted right at function entry. The instruction
sequence used is minimal to reduce overhead. Crucially, a stackframe is
not created for the function being traced. This breaks stack unwinding
since the function being traced does not have a stackframe for itself.
As such, it never shows up in the backtrace:

/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled
/sys/kernel/debug/tracing # cat stack_trace
        Depth    Size   Location    (17 entries)
        -----    ----   --------
  0)     4144      32   ftrace_call+0x4/0x44
  1)     4112     432   get_page_from_freelist+0x26c/0x1ad0
  2)     3680     496   __alloc_pages+0x290/0x1280
  3)     3184     336   __folio_alloc+0x34/0x90
  4)     2848     176   vma_alloc_folio+0xd8/0x540
  5)     2672     272   __handle_mm_fault+0x700/0x1cc0
  6)     2400     208   handle_mm_fault+0xf0/0x3f0
  7)     2192      80   ___do_page_fault+0x3e4/0xbe0
  8)     2112     160   do_page_fault+0x30/0xc0
  9)     1952     256   data_access_common_virt+0x210/0x220
 10)     1696     400   0xc00000000f16b100
 11)     1296     384   load_elf_binary+0x804/0x1b80
 12)      912     208   bprm_execve+0x2d8/0x7e0
 13)      704      64   do_execveat_common+0x1d0/0x2f0
 14)      640     160   sys_execve+0x54/0x70
 15)      480      64   system_call_exception+0x138/0x350
 16)      416     416   system_call_common+0x160/0x2c4

Fix this by having ftrace create a dummy stackframe for the function
being traced. With this, backtraces now capture the function being
traced:

/sys/kernel/debug/tracing # cat stack_trace
        Depth    Size   Location    (17 entries)
        -----    ----   --------
  0)     3888      32   _raw_spin_trylock+0x8/0x70
  1)     3856     576   get_page_from_freelist+0x26c/0x1ad0
  2)     3280      64   __alloc_pages+0x290/0x1280
  3)     3216     336   __folio_alloc+0x34/0x90
  4)     2880     176   vma_alloc_folio+0xd8/0x540
  5)     2704     416   __handle_mm_fault+0x700/0x1cc0
  6)     2288      96   handle_mm_fault+0xf0/0x3f0
  7)     2192      48   ___do_page_fault+0x3e4/0xbe0
  8)     2144     192   do_page_fault+0x30/0xc0
  9)     1952     608   data_access_common_virt+0x210/0x220
 10)     1344      16   0xc0000000334bbb50
 11)     1328     416   load_elf_binary+0x804/0x1b80
 12)      912      64   bprm_execve+0x2d8/0x7e0
 13)      848     176   do_execveat_common+0x1d0/0x2f0
 14)      672     192   sys_execve+0x54/0x70
 15)      480      64   system_call_exception+0x138/0x350
 16)      416     416   system_call_common+0x160/0x2c4

This results in two additional stores in the ftrace entry code, but
produces reliable backtraces.

Fixes: 153086644f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: stable@vger.kernel.org
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230621051349.759567-1-naveen@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 12:08:21 +02:00
Michael Ellerman
484b8fb1ff powerpc/security: Fix Speculation_Store_Bypass reporting on Power10
commit 5bcedc5931 upstream.

Nageswara reported that /proc/self/status was showing "vulnerable" for
the Speculation_Store_Bypass feature on Power10, eg:

  $ grep Speculation_Store_Bypass: /proc/self/status
  Speculation_Store_Bypass:       vulnerable

But at the same time the sysfs files, and lscpu, were showing "Not
affected".

This turns out to simply be a bug in the reporting of the
Speculation_Store_Bypass, aka. PR_SPEC_STORE_BYPASS, case.

When SEC_FTR_STF_BARRIER was added, so that firmware could communicate
the vulnerability was not present, the code in ssb_prctl_get() was not
updated to check the new flag.

So add the check for SEC_FTR_STF_BARRIER being disabled. Rather than
adding the new check to the existing if block and expanding the comment
to cover both cases, rewrite the three cases to be separate so they can
be commented separately for clarity.

Fixes: 84ed26fd00 ("powerpc/security: Add a security feature for STF barrier")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230517074945.53188-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:32 +02:00
Aditya Gupta
865d128cab powerpc: update ppc_save_regs to save current r1 in pt_regs
[ Upstream commit b684c09f09 ]

ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.

When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.

GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.

GDB backtrace with vmcore of kernel without this patch:

---------
(gdb) bt
 #0  0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>,
    newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
 #1  __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
 #2  0x0000000000000063 in ?? ()
 #3  0xc000000003579320 in ?? ()
---------

Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.

GDB backtrace with vmcore of patched kernel:

--------
(gdb) bt
 #0  0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8)
    at ./arch/powerpc/include/asm/kexec.h:69
 #1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
 #2  0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n")
    at kernel/panic.c:358
 #3  0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
 #4  0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false)
    at drivers/tty/sysrq.c:602
 #5  0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>,
    count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
 #6  0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>,
    buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340
 #7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>,
    ppos=<optimized out>) at fs/proc/inode.c:352
 #8  0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00,
    buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>,
    count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582
 #9  0xc0000000005b4264 in ksys_write (fd=<optimized out>,
    buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2)
    at fs/read_write.c:637
 #10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>)
    at arch/powerpc/kernel/syscall.c:171
 #11 0xc00000000000c270 in system_call_vectored_common ()
    at arch/powerpc/kernel/interrupt_64.S:192
--------

Nick adds:
  So this now saves regs as though it was an interrupt taken in the
  caller, at the instruction after the call to ppc_save_regs, whereas
  previously the NIP was there, but R1 came from the caller's caller and
  that mismatch is what causes gdb's dwarf unwinder to go haywire.

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Fixes: d16a58f885 ("powerpc: Improve ppc_save_regs()")
Reivewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230615091047.90433-1-adityag@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:39 +02:00
Nicholas Piggin
4cff1be1cb powerpc: simplify ppc_save_regs
[ Upstream commit 37195b820d ]

Adjust the pt_regs pointer so the interrupt frame offsets can be used
to save registers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-7-npiggin@gmail.com
Stable-dep-of: b684c09f09 ("powerpc: update ppc_save_regs to save current r1 in pt_regs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:39 +02:00
Christophe Leroy
7cfd310111 powerpc/signal32: Force inlining of __unsafe_save_user_regs() and save_tm_user_regs_unsafe()
[ Upstream commit a03b1a0b19 ]

Looking at generated code for handle_signal32() shows calls to a
function called __unsafe_save_user_regs.constprop.0 while user access
is open.

And that __unsafe_save_user_regs.constprop.0 function has two nops at
the begining, allowing it to be traced, which is unexpected during
user access open window.

The solution could be to mark __unsafe_save_user_regs() no trace, but
to be on the safe side the most efficient is to flag it __always_inline
as already done for function __unsafe_restore_general_regs(). The
function is relatively small and only called twice, so the size
increase will remain in the noise.

Do the same with save_tm_user_regs_unsafe() as it may suffer the
same issue.

Fixes: ef75e73182 ("powerpc/signal32: Transform save_user_regs() and save_tm_user_regs() in 'unsafe' version")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7e469c8f01860a69c1ada3ca6a5e2aa65f0f74b2.1685955220.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:37 +02:00
Christophe Leroy
af0c61c5bb powerpc/interrupt: Don't read MSR from interrupt_exit_kernel_prepare()
[ Upstream commit 0eb089a72f ]

A disassembly of interrupt_exit_kernel_prepare() shows a useless read
of MSR register. This is shown by r9 being re-used immediately without
doing anything with the value read.

  c000e0e0:       60 00 00 00     nop
  c000e0e4:       7d 3a c2 a6     mfmd_ap r9
  c000e0e8:       7d 20 00 a6     mfmsr   r9
  c000e0ec:       7c 51 13 a6     mtspr   81,r2
  c000e0f0:       81 3f 00 84     lwz     r9,132(r31)
  c000e0f4:       71 29 80 00     andi.   r9,r9,32768

This is due to the use of local_irq_save(). The flags read by
local_irq_save() are never used, use local_irq_disable() instead.

Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/df36c6205ab64326fb1b991993c82057e92ace2f.1685955214.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:37 +02:00
Gaurav Batra
9a74146540 powerpc/iommu: Incorrect DDW Table is referenced for SR-IOV device
commit 1f7aacc5eb upstream.

For an SR-IOV device, while enabling DDW, a new table is created and
added at index 1 in the group. In the below 2 scenarios, the table is
incorrectly referenced at index 0 (which is where the table is for
default DMA window).

1. When adding DDW

   This issue is exposed with "slub_debug". Error thrown out from
   dma_iommu_dma_supported()

   Warning: IOMMU offset too big for device mask
   mask: 0xffffffff, table offset: 0x800000000000000

2. During Dynamic removal of the PCI device.

   Error is from iommu_tce_table_put() since a NULL table pointer is
   passed in.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230505184701.91613-1-gbatra@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-24 17:32:52 +01:00
Gaurav Batra
fc983cf5dd powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
commit 096339ab84 upstream.

When DMA window is backed by 2MB TCEs, the DMA address for the mapped
page should be the offset of the page relative to the 2MB TCE. The code
was incorrectly setting the DMA address to the beginning of the TCE
range.

Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
ethernet port, when DMA window is backed by 2MB TCEs.

Fixes: 3872731187 ("powerps/pseries/dma: Add support for 2M IOMMU page size")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Greg Joyce <gjoyce@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230504175913.83844-1-gbatra@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-24 17:32:52 +01:00
Nathan Lynch
46f773f39e powerpc/rtas: use memmove for potentially overlapping buffer copy
[ Upstream commit 271208ee5e ]

Using memcpy() isn't safe when buf is identical to rtas_err_buf, which
can happen during boot before slab is up. Full context which may not
be obvious from the diff:

	if (altbuf) {
		buf = altbuf;
	} else {
		buf = rtas_err_buf;
		if (slab_is_available())
			buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC);
	}
	if (buf)
		memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);

This was found by inspection and I'm not aware of it causing problems
in practice. It appears to have been introduced by commit
033ef338b6 ("powerpc: Merge rtas.c into arch/powerpc/kernel"); the
old ppc64 version of this code did not have this problem.

Use memmove() instead.

Fixes: 033ef338b6 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-2-010e4416f13f@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:31 +09:00
Jens Axboe
064a1c7b0f powerpc: Don't try to copy PPR for task with NULL pt_regs
commit fd72761894 upstream.

powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:

  Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
  BUG: Kernel NULL pointer dereference on read at 0x00000160
  Faulting instruction address: 0xc0000000000c3a60
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
  Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
  CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  NIP:  c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
  REGS: c0000000041833b0 TRAP: 0300   Not tainted  (6.3.0-rc2+)
  MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88082828  XER: 200400f8
  ...
  NIP memcpy_power7+0x200/0x7d0
  LR  ppr_get+0x64/0xb0
  Call Trace:
    ppr_get+0x40/0xb0 (unreliable)
    __regset_get+0x180/0x1f0
    regset_get_alloc+0x64/0x90
    elf_core_dump+0xb98/0x1b60
    do_coredump+0x1c34/0x24a0
    get_signal+0x71c/0x1410
    do_notify_resume+0x140/0x6f0
    interrupt_exit_user_prepare_main+0x29c/0x320
    interrupt_exit_user_prepare+0x6c/0xa0
    interrupt_return_srr_user+0x8/0x138

Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.

Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.

Fixes: fa439810cc ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-06 12:10:53 +02:00
Rohan McLure
f69bbc0bd6 powerpc/kcsan: Exclude udelay to prevent recursive instrumentation
[ Upstream commit 2a7ce82dc4 ]

In order for KCSAN to increase its likelihood of observing a data race,
it sets a watchpoint on memory accesses and stalls, allowing for
detection of conflicting accesses by other kernel threads or interrupts.

Stalls are implemented by injecting a call to udelay in instrumented code.
To prevent recursive instrumentation, exclude udelay from being instrumented.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17 08:50:31 +01:00
Nicholas Piggin
07a5f4e697 powerpc/64: Move paca allocation to early_setup()
[ Upstream commit dc222fa773 ]

The early paca and boot cpuid dance is complicated and currently does
not quite work as expected for boot cpuid != 0 cases.

early_init_devtree() currently allocates the paca_ptrs and boot cpuid
paca, but until that returns and early_setup() calls setup_paca(), this
thread is currently still executing with smp_processor_id() == 0.

One problem this causes is the paca_ptrs[smp_processor_id()] pointer is
poisoned, so valid_emergency_stack() (any backtrace) and any similar
users will crash.

Another is that the hardware id which is set here will not be returned
by get_hard_smp_processor_id(smp_processor_id()), but it would work
correctly for boot_cpuid == 0, which could lead to difficult to
reproduce or find bugs. The hard id does not seem to be used by the rest
of early_init_devtree(), it just looks like all this code might have
been put here to allocate somewhere to store boot CPU hardware id while
scanning the devtree.

Rearrange things so the hwid is put in a global variable like
boot_cpuid, and do all the paca allocation and boot paca setup in the
64-bit early_setup() after we have everything ready to go.

The paca_ptrs[0] re-poisoning code in early_setup does not seem to have
ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid
is not 0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17 08:50:31 +01:00
Nicholas Piggin
be2e726fa0 powerpc/64: Fix task_cpu in early boot when booting non-zero cpuid
[ Upstream commit 9fa24404f5 ]

powerpc/64 can boot on a non-zero SMP processor id. Initially, the boot
CPU is said to be "assumed to be 0" until early_init_devtree() discovers
the id from the device tree. That is not a good description because the
assumption can be wrong and that has to be handled, the better
description is that 0 is used as a placeholder, and things are fixed
after the real id is discovered.

smp_processor_id() is set to the boot cpuid, but task_cpu(current) is
not, which causes the smp_processor_id() == task_cpu(current) invariant
to be broken until init_idle() in sched_init().

This is quite fragile and could lead to subtle bugs in future. One bug
is that validate_sp_size uses task_cpu() to get the process stack, so
any stack trace from the booting CPU between early_init_devtree()
and sched_init() will have problems. Early on paca_ptrs[0] will be
poisoned, so that can cause machine checks dereferencing that memory
in real mode. Later, validating the current stack pointer against the
idle task of a different secondary will probably cause no stack trace
to be printed.

Fix this by setting thread_info->cpu right after smp_processor_id() is
set to the boot cpuid.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build as reported by sfr]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-3-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17 08:50:31 +01:00
Greg Kroah-Hartman
24c1bd1cd0 powerpc/iommu: fix memory leak with using debugfs_lookup()
[ Upstream commit b505063910 ]

When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230202141919.2298821-1-gregkh@linuxfoundation.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17 08:50:30 +01:00
Nicholas Piggin
ac872bb004 powerpc/64: Don't recurse irq replay
[ Upstream commit 5746ca131e ]

Interrupt handlers called by soft-pending irq replay code can run
softirqs, softirq replay enables and disables local irqs, which allows
interrupts to come in including soft-masked interrupts, and it can
cause pending irqs to be replayed again. That makes the soft irq replay
state machine and possible races more complicated and fragile than it
needs to be.

Use irq_enter/irq_exit around irq replay to prevent softirqs running
while interrupts are being replayed. Softirqs will now be run at the
irq_exit() call after all the irq replaying is done. This prevents irqs
being replayed while irqs are being replayed, and should hopefully make
things simpler and easier to think about and debug.

A new PACA_IRQ_REPLAYING is added to prevent asynchronous interrupt
handlers hard-enabling EE while pending irqs are being replayed, because
that causes new pending irqs to arrive which is also a complexity. This
means pending irqs won't be profiled quite so well because perf irqs
can't be taken.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121102618.2824429-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17 08:50:30 +01:00
Michael Ellerman
a783d7f7cb powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds
commit 07b050f929 upstream.

Relocatable kernels must not discard relocations, they need to be
processed at runtime. As such they are included for CONFIG_RELOCATABLE
builds in the powerpc linker script (line 340).

However they are also unconditionally discarded later in the
script (line 414). Previously that worked because the earlier inclusion
superseded the discard.

However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 137). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier, causing .rela* to
actually be discarded at link time, leading to build warnings and a
kernel that doesn't boot:

  ld: warning: discarding dynamic section .rela.init.rodata

Fix it by conditionally discarding .rela* only when CONFIG_RELOCATABLE
is disabled.

Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-2-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
2023-02-25 11:25:42 +01:00
Michael Ellerman
9c87fd4a30 powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT
commit 4b9880dbf3 upstream.

The powerpc linker script explicitly includes .exit.text, because
otherwise the link fails due to references from __bug_table and
__ex_table. The code is freed (discarded) at runtime along with
.init.text and data.

That has worked in the past despite powerpc not defining
RUNTIME_DISCARD_EXIT because DISCARDS appears late in the powerpc linker
script (line 410), and the explicit inclusion of .exit.text
earlier (line 280) supersedes the discard.

However commit 99cb0d917f ("arch: fix broken BuildID for arm64 and
riscv") introduced an earlier use of DISCARD as part of the RO_DATA
macro (line 136). With binutils < 2.36 that causes the DISCARD
directives later in the script to be applied earlier [1], causing
.exit.text to actually be discarded at link time, leading to build
errors:

  '.exit.text' referenced in section '__bug_table' of crypto/algboss.o: defined in
  discarded section '.exit.text' of crypto/algboss.o
  '.exit.text' referenced in section '__ex_table' of drivers/nvdimm/core.o: defined in
  discarded section '.exit.text' of drivers/nvdimm/core.o

Fix it by defining RUNTIME_DISCARD_EXIT, which causes the generic
DISCARDS macro to not include .exit.text at all.

1: https://lore.kernel.org/lkml/87fscp2v7k.fsf@igel.home/

Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230105132349.384666-1-mpe@ellerman.id.au
Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-25 11:25:42 +01:00
Nicholas Piggin
63c12d00d8 powerpc/64: Fix perf profiling asynchronous interrupt handlers
[ Upstream commit c28548012e ]

Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the
hard irq disabled state. So when should_hard_irq_enable() returns true
because we want PMI interrupts in irq handlers, MSR[EE] is enabled but
PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before
enabling MSR[EE].

This also tidies some of the warnings, no need to duplicate them in
both should_hard_irq_enable() and do_hard_irq_enable().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-22 12:59:43 +01:00
Nicholas Piggin
6f097c2481 powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
commit 2ea31e2e62 upstream.

The RFI and STF security mitigation options can flip the
interrupt_exit_not_reentrant static branch condition concurrently with
the interrupt exit code which tests that branch.

Interrupt exit tests this condition to set MSR[EE|RI] for exit, then
again in the case a soft-masked interrupt is found pending, to recover
the MSR so the interrupt can be replayed before attempting to exit
again. If the condition changes between these two tests, the MSR and irq
soft-mask state will become corrupted, leading to warnings and possible
crashes. For example, if the branch is initially true then false,
MSR[EE] will be 0 but PACA_IRQ_HARD_DIS clear and EE may not get
enabled, leading to warnings in irq_64.c.

Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206042240.92103-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-14 19:11:53 +01:00
Nathan Lynch
482d990a5d powerpc/rtas: avoid scheduling in rtas_os_term()
[ Upstream commit 6c606e57ee ]

It's unsafe to use rtas_busy_delay() to handle a busy status from
the ibm,os-term RTAS function in rtas_os_term():

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Tainted: G      D            6.0.0-rc5-02182-gf8553a572277-dirty #9
Call Trace:
[c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable)
[c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0
[c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0
[c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4
[c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68
[c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50
[c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0
[c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0
[c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0
[c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420
[c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200

Use rtas_busy_delay_time() instead, which signals without side effects
whether to attempt the ibm,os-term RTAS call again.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-04 11:28:58 +01:00
Nathan Lynch
464d10e8d7 powerpc/rtas: avoid device tree lookups in rtas_os_term()
[ Upstream commit ed2213bfb1 ]

rtas_os_term() is called during panic. Its behavior depends on a couple
of conditions in the /rtas node of the device tree, the traversal of
which entails locking and local IRQ state changes. If the kernel panics
while devtree_lock is held, rtas_os_term() as currently written could
hang.

Instead of discovering the relevant characteristics at panic time,
cache them in file-static variables at boot. Note the lookup for
"ibm,extended-os-term" is converted to of_property_read_bool() since it
is a boolean property, not an RTAS function token.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Incorporate suggested change from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-04 11:28:57 +01:00
Nicholas Piggin
eb761a1760 powerpc: Fix writable sections being moved into the rodata region
.data.rel.ro*  catches .data.rel.root_cpuacct, and the kernel crashes on
a store in css_clear_dir. At least we know read-only data protection is
working...

Fixes: b6adc6d6d3 ("powerpc/build: move .data.rel.ro, .sdata2 to read-only")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221116043954.3307852-1-npiggin@gmail.com
2022-11-16 21:37:14 +11:00
Andreas Schwab
ce883a2ba3 powerpc/32: fix syscall wrappers with 64-bit arguments
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Fixes: e237506238 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
2022-11-01 10:24:09 +11:00
Nicholas Piggin
65722736c3 powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
Commit a4cb3651a1 ("powerpc/64s/interrupt: Fix lost interrupts when
returning to soft-masked context") fixed the problem of pending irqs
being cleared when clearing the HARD_DIS bit, but then it didn't clear
the bit at all. This change clears HARD_DIS without affecting other bits
in the mask.

When an interrupt hits in a soft-masked section that has MSR[EE]=1, it
can hard disable and set PACA_IRQS_HARD_DIS, which must be cleared when
returning to the EE=1 caller (unless it was set due to a MUST_HARD_MASK
interrupt becoming pending). Failure to clear this leaves the
returned-to context running with MSR[EE]=1 and PACA_IRQS_HARD_DIS, which
confuses irq assertions and could be dangerous for code that might test
the flag.

This was observed in a hash MMU kernel where a kernel hash fault hits in
a local_irqs_disabled region that has EE=1. The hash fault also runs
with EE=1, then as it returns, a decrementer hits in the restart section
and the irq restart code hard-masks which sets the PACA_IRQ_HARD_DIS
flag, which is not clear when the original context is returned to.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: a4cb3651a1 ("powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221022052207.471328-1-npiggin@gmail.com
2022-10-27 00:38:35 +11:00
Nicholas Piggin
dc398a084d powerpc/64s/interrupt: Perf NMI should not take normal exit path
NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with
interrupt_return_srr, which is what the perf NMI handler currently does.
This breaks if a PMI hits after interrupt_exit_user_prepare_main() has
switched the context tracking to user mode, then the CT_WARN_ON() in
interrupt_exit_kernel_prepare() fires because it returns to kernel with
context set to user.

This could possibly be solved by soft-disabling PMIs in the exit path,
but that reduces our ability to profile that code. The warning could be
removed, but it's potentially useful.

All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so
this makes perf interrupts consistent with that and seems like the best
fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Squash in fixups from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006140413.126443-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin
a073672eb0 powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
NMI PMIs really should not return using the normal interrupt_return
function. If such a PMI hits in code returning to user with the context
switched to user mode, this warning can fire. This was enough to cause
crashes when reproducing on 64s, because another perf interrupt would
hit while reporting bug, and that would cause another bug, and so on
until smashing the stack.

Work around that particular crash for now by just disabling that context
warning for PMIs. This is a hack and not a complete fix, there could be
other such problems lurking in corners. But it does fix the known crash.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Linus Torvalds
f1947d7c8a Random number generator fixes for Linux 6.1-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
 A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
 qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
 R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
 lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
 MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
 8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
 XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
 Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
 vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
 4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
 lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
 =M+mV
 -----END PGP SIGNATURE-----

Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:
 "This time with some large scale treewide cleanups.

  The intent of this pull is to clean up the way callers fetch random
  integers. The current rules for doing this right are:

   - If you want a secure or an insecure random u64, use get_random_u64()

   - If you want a secure or an insecure random u32, use get_random_u32()

     The old function prandom_u32() has been deprecated for a while
     now and is just a wrapper around get_random_u32(). Same for
     get_random_int().

   - If you want a secure or an insecure random u16, use get_random_u16()

   - If you want a secure or an insecure random u8, use get_random_u8()

   - If you want secure or insecure random bytes, use get_random_bytes().

     The old function prandom_bytes() has been deprecated for a while
     now and has long been a wrapper around get_random_bytes()

   - If you want a non-uniform random u32, u16, or u8 bounded by a
     certain open interval maximum, use prandom_u32_max()

     I say "non-uniform", because it doesn't do any rejection sampling
     or divisions. Hence, it stays within the prandom_*() namespace, not
     the get_random_*() namespace.

     I'm currently investigating a "uniform" function for 6.2. We'll see
     what comes of that.

  By applying these rules uniformly, we get several benefits:

   - By using prandom_u32_max() with an upper-bound that the compiler
     can prove at compile-time is ≤65536 or ≤256, internally
     get_random_u16() or get_random_u8() is used, which wastes fewer
     batched random bytes, and hence has higher throughput.

   - By using prandom_u32_max() instead of %, when the upper-bound is
     not a constant, division is still avoided, because
     prandom_u32_max() uses a faster multiplication-based trick instead.

   - By using get_random_u16() or get_random_u8() in cases where the
     return value is intended to indeed be a u16 or a u8, we waste fewer
     batched random bytes, and hence have higher throughput.

  This series was originally done by hand while I was on an airplane
  without Internet. Later, Kees and I worked on retroactively figuring
  out what could be done with Coccinelle and what had to be done
  manually, and then we split things up based on that.

  So while this touches a lot of files, the actual amount of code that's
  hand fiddled is comfortably small"

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove unused functions
  treewide: use get_random_bytes() when possible
  treewide: use get_random_u32() when possible
  treewide: use get_random_{u8,u16}() when possible, part 2
  treewide: use get_random_{u8,u16}() when possible, part 1
  treewide: use prandom_u32_max() when possible, part 2
  treewide: use prandom_u32_max() when possible, part 1
2022-10-16 15:27:07 -07:00
Linus Torvalds
70609c1495 powerpc fixes for 6.1 #2
- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned register-pairs.
    Notably this broke ftruncate64 & pread/write64, which can lead to file corruption.
 
  - Fix lost interrupts when returning to soft-masked context on 64-bit.
 
  - Fix build failure when CONFIG_DTL=n.
 
 Thanks to: Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd Bergmann, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNJQp8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPd3D/9+uPapAs6/ZGkrbojseiF/fZNyIfTF
 HcLZ3iWUP9+APuaRxd1m3HbBoWdgFQ9yC5abYnX1z+E/yEhbTaOof9Gu1vXfxbIF
 Orlc+X12uZMrdU1L2qlgk/89QLeKOTlp/b2eUXJ332Vl4DfJDoknMsqTqbMz9NrL
 EDj6vmG3JzJgKHmplwxd7d4CExQNh9khZr3lZ6Ae/hC1cZ1PVXD67akwxsc0i8E2
 dBGzwsP1SYXPrBMXONT1GNs/whXrKBHNyLWFueab3p0FsSkbc4sY/HaAPzjq7YzK
 aPV0FQi2YpIZ5IIunZA+vW+tQ8EvyOpOUPMGb7iiN/31YcFbf/k6Y92AlCFlHRq3
 9vvsNGzDeFBfvH2wWcR1XaAMQHmBpC4L+dYbCKIyuK9DjgZtpZUV3acagRMFXqMN
 TeN9h2GXAYxCdNc27I5v3wAvbNcx69E4Bs5F7voKGD4pb7V/4d/li2oix6v0FKIy
 qAt1PGhD4IgPE3oJOEOYsJRVQNyDfG4eKlFh5ijyyc5hQeGNeKiAH+G+a61R3ueo
 GvVOdkR/1QxK1XkbgrMyclb0nHN1kEjRHeIcMFW1+dMKA9UvOlJ6u8nkOFRvA3js
 r00O1uE7DT5WzdGXgEHCCDd7PybBKxpM32m4/vzgDM7xG+DHxp3j7LTQsDOfkFsH
 bkc31LqrAw3yDQ==
 =B+7s
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
   register-pairs. Notably this broke ftruncate64 & pread/write64, which
   can lead to file corruption.

 - Fix lost interrupts when returning to soft-masked context on 64-bit.

 - Fix build failure when CONFIG_DTL=n.

Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.

* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_DTL=n build
  powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
2022-10-14 11:16:18 -07:00
Nicholas Piggin
a4cb3651a1 powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
It's possible for an interrupt returning to an irqs-disabled context to
lose a pending soft-masked irq because it branches to part of the exit
code for irqs-enabled contexts, which is meant to clear only the
PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
just looks like a simple thinko from a recent commit (if there was no
hard mask pending, there would be no reason to clear it anyway).

This also adds comment to the code that actually does need to clear the
flag.

Fixes: e485f6c751 ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013064418.1311104-1-npiggin@gmail.com
2022-10-13 22:29:49 +11:00
Linus Torvalds
676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Nicholas Piggin
e237506238 powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.

The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.

With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.

Fix this by having 32-bit kernels share those syscall definitions with
compat.

Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
2022-10-13 00:49:58 +11:00
Jason A. Donenfeld
81895a65ec treewide: use prandom_u32_max() when possible, part 1
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value & (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:55 -06:00
Joel Stanley
ae5b6779fa powerpc: Fix 85xx build
The merge of the kbuild tree dropped the renaming of the FSL_BOOKE
kconfig option.

Fixes: 8afc66e8d4 ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-11 10:13:34 -07:00
Linus Torvalds
27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds
d4013bc4d4 bitmap patches for v6.1-rc1
From Phil Auld:
 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
 
 From me:
 cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess
 
 This series cleans that mess and adds new config FORCE_NR_CPUS that
 allows to optimize cpumask subsystem if the number of CPUs is known
 at compile-time.
 
 From me:
 lib: optimize find_bit() functions
 
 Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros.
 
 From me:
 lib/find: add find_nth_bit()
 
 Adds find_nth_bit(), which is ~70 times faster than bitcounting with
 for_each() loop:
         for_each_set_bit(bit, mask, size)
                 if (n-- == 0)
                         return bit;
 
 Also adds bitmap_weight_and() to let people replace this pattern:
 	tmp = bitmap_alloc(nbits);
 	bitmap_and(tmp, map1, map2, nbits);
 	weight = bitmap_weight(tmp, nbits);
 	bitmap_free(tmp);
 with a single bitmap_weight_and() call.
 
 From me:
 cpumask: repair cpumask_check()
 
 After switching cpumask to use nr_cpu_ids, cpumask_check() started
 generating many false-positive warnings. This series fixes it.
 
 From Valentin Schneider:
 bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot()
 
 Extends the API with one more function and applies it in sched/core.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof
 vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow
 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD
 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6
 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc
 /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001
 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm
 r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV
 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3
 GR7sMQae
 =Q/+F
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)

 - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)

   This series cleans that mess and adds new config FORCE_NR_CPUS that
   allows to optimize cpumask subsystem if the number of CPUs is known
   at compile-time.

 - optimize find_bit() functions (me)

   Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
   macros.

 - add find_nth_bit() (me)

   Adds find_nth_bit(), which is ~70 times faster than bitcounting with
   for_each() loop:

	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

   Also adds bitmap_weight_and() to let people replace this pattern:

	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

   with a single bitmap_weight_and() call.

 - repair cpumask_check() (me)

   After switching cpumask to use nr_cpu_ids, cpumask_check() started
   generating many false-positive warnings. This series fixes it.

 - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
   Schneider)

   Extends the API with one more function and applies it in sched/core.

* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
  sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  cpumask: Introduce for_each_cpu_andnot()
  lib/find_bit: Introduce find_next_andnot_bit()
  cpumask: fix checking valid cpu range
  lib/bitmap: add tests for for_each() loops
  lib/find: optimize for_each() macros
  lib/bitmap: introduce for_each_set_bit_wrap() macro
  lib/find_bit: add find_next{,_and}_bit_wrap
  cpumask: switch for_each_cpu{,_not} to use for_each_bit()
  net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
  cpumask: add cpumask_nth_{,and,andnot}
  lib/bitmap: remove bitmap_ord_to_pos
  lib/bitmap: add tests for find_nth_bit()
  lib: add find_nth{,_and,_andnot}_bit()
  lib/bitmap: add bitmap_weight_and()
  lib/bitmap: don't call __bitmap_weight() in kernel code
  tools: sync find_bit() implementation
  lib/find_bit: optimize find_next_bit() functions
  lib/find_bit: create find_first_zero_bit_le()
  ...
2022-10-10 12:49:34 -07:00