Commit graph

1153580 commits

Author SHA1 Message Date
Andrew Donnellan
f74dcbfd27 powerpc/pseries: Fix handling of PLPKS object flushing timeout
plpks_confirm_object_flushed() uses the H_PKS_CONFIRM_OBJECT_FLUSHED hcall
to check whether changes to an object in the Platform KeyStore have been
flushed to non-volatile storage.

The hcall returns two output values, the return code and the flush status.
plpks_confirm_object_flushed() polls the hcall until either the flush
status has updated, the return code is an error, or a timeout has been
exceeded.

While we're still polling, the hcall is returning H_SUCCESS (0) as the
return code. In the timeout case, this means that upon exiting the polling
loop, rc is 0, and therefore 0 is returned to the user.

Handle the timeout case separately and return ETIMEDOUT if triggered.

Fixes: 2454a7af0f ("powerpc/pseries: define driver for Platform KeyStore")
Reported-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Tested-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-2-ajd@linux.ibm.com
2023-02-12 22:12:36 +11:00
Michael Ellerman
fc8a898cfd Merge branch 'fixes' into next
Merge our fixes branch to bring in some changes that conflict with
upcoming next content.
2023-02-12 22:11:56 +11:00
Geoff Levand
544f823ec7 powerpc/ps3: Refresh ps3_defconfig
Refresh ps3_defconfig for v6.2.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99e87549b17feca3494e9df6f4def04a9ec7c042.1672767868.git.geoff@infradead.org
2023-02-12 22:11:38 +11:00
Geoff Levand
5705c6d97e powerpc/ps3: Change updateboltedpp() panic to info
Commit fdacae8a84 ("powerpc: Activate CONFIG_STRICT_KERNEL_RWX by
default") causes ps3_hpte_updateboltedpp() to be called.

The correct fix would be to implement updateboltedpp() for PS3, but it's
not clear if that's possible. As a stop-gap, change the panic statment
in ps3_hpte_updateboltedpp() to a pr_info statement so that bootup can
continue.

Signed-off-by: Geoff Levand <geoff@infradead.org>
[mpe: Flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2df879d982809c05b0dfade57942fe03dbe9e7de.1672767868.git.geoff@infradead.org
2023-02-12 22:11:35 +11:00
Rohan McLure
6f0926c005 powerpc/kcsan: Add KCSAN Support
Enable HAVE_ARCH_KCSAN for 64-bit Book3S, permitting use of the kernel
concurrency sanitiser through the CONFIG_KCSAN_* kconfig options. KCSAN
requires compiler builtins __atomic_* 64-bit values, and so only report
support on 64-bit.

See documentation in Documentation/dev-tools/kcsan.rst for more
information.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
[mpe: Limit to Book3S to avoid build failure on Book3E]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-6-rmclure@linux.ibm.com
2023-02-12 22:09:20 +11:00
Rohan McLure
4f8e09106f powerpc/kcsan: Prevent recursive instrumentation with IRQ save/restores
Instrumented memory accesses provided by KCSAN will access core-local
memories (which will save and restore IRQs) as well as restoring IRQs
directly. Avoid recursive instrumentation by applying __no_kcsan
annotation to IRQ restore routines.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
[mpe: Resolve merge conflict with IRQ replay recursion changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-5-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Rohan McLure
b6e259297a powerpc/kcsan: Memory barriers semantics
Annotate memory barriers *mb() with calls to kcsan_mb(), signaling to
compilers supporting KCSAN that the respective memory barrier has been
issued. Rename memory barrier *mb() to __*mb() to opt in for
asm-generic/barrier.h to generate the respective *mb() macro.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-4-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Rohan McLure
2a7ce82dc4 powerpc/kcsan: Exclude udelay to prevent recursive instrumentation
In order for KCSAN to increase its likelihood of observing a data race,
it sets a watchpoint on memory accesses and stalls, allowing for
detection of conflicting accesses by other kernel threads or interrupts.

Stalls are implemented by injecting a call to udelay in instrumented code.
To prevent recursive instrumentation, exclude udelay from being instrumented.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Rohan McLure
2fb857bc9f powerpc/kcsan: Add exclusions from instrumentation
Exclude various incompatible compilation units from KCSAN
instrumentation.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-2-rmclure@linux.ibm.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
1ee4e35076 powerpc: Skip stack validation checking alternate stacks if they are not allocated
Stack validation in early boot can just bail out of checking alternate
stacks if they are not validated yet. Checking against a NULL stack
could cause NULLish pointer values to be considered valid.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-5-npiggin@gmail.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
dc222fa773 powerpc/64: Move paca allocation to early_setup()
The early paca and boot cpuid dance is complicated and currently does
not quite work as expected for boot cpuid != 0 cases.

early_init_devtree() currently allocates the paca_ptrs and boot cpuid
paca, but until that returns and early_setup() calls setup_paca(), this
thread is currently still executing with smp_processor_id() == 0.

One problem this causes is the paca_ptrs[smp_processor_id()] pointer is
poisoned, so valid_emergency_stack() (any backtrace) and any similar
users will crash.

Another is that the hardware id which is set here will not be returned
by get_hard_smp_processor_id(smp_processor_id()), but it would work
correctly for boot_cpuid == 0, which could lead to difficult to
reproduce or find bugs. The hard id does not seem to be used by the rest
of early_init_devtree(), it just looks like all this code might have
been put here to allocate somewhere to store boot CPU hardware id while
scanning the devtree.

Rearrange things so the hwid is put in a global variable like
boot_cpuid, and do all the paca allocation and boot paca setup in the
64-bit early_setup() after we have everything ready to go.

The paca_ptrs[0] re-poisoning code in early_setup does not seem to have
ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid
is not 0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
2023-02-10 22:19:56 +11:00
Nicholas Piggin
9fa24404f5 powerpc/64: Fix task_cpu in early boot when booting non-zero cpuid
powerpc/64 can boot on a non-zero SMP processor id. Initially, the boot
CPU is said to be "assumed to be 0" until early_init_devtree() discovers
the id from the device tree. That is not a good description because the
assumption can be wrong and that has to be handled, the better
description is that 0 is used as a placeholder, and things are fixed
after the real id is discovered.

smp_processor_id() is set to the boot cpuid, but task_cpu(current) is
not, which causes the smp_processor_id() == task_cpu(current) invariant
to be broken until init_idle() in sched_init().

This is quite fragile and could lead to subtle bugs in future. One bug
is that validate_sp_size uses task_cpu() to get the process stack, so
any stack trace from the booting CPU between early_init_devtree()
and sched_init() will have problems. Early on paca_ptrs[0] will be
poisoned, so that can cause machine checks dereferencing that memory
in real mode. Later, validating the current stack pointer against the
idle task of a different secondary will probably cause no stack trace
to be printed.

Fix this by setting thread_info->cpu right after smp_processor_id() is
set to the boot cpuid.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build as reported by sfr]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-3-npiggin@gmail.com
2023-02-10 22:19:38 +11:00
Nicholas Piggin
dea18da459 powerpc/64s: Fix stress_hpt memblock alloc alignment
The stress_hpt memblock allocation did not pass in an alignment,
which causes a stack dump in early boot (that I missed, oops).

Fixes: 6b34a099fa ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-2-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Nicholas Piggin
ffc8e90dec powerpc/64e: Simplify address calculation in secondary hold loop
As the earlier comment explains, __secondary_hold_spinloop does not have
to be accessed at its virtual address, slightly simplifying code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-4-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Nicholas Piggin
58f24eea52 powerpc/64s: Refactor initialisation after prom
Move some basic Book3S initialisation after prom to a function similar
to what Book3E looks like. Book3E returns from this function at the
virtual address mapping, and Book3S will do the same in a later change,
so making them look similar helps with that.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-3-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Nicholas Piggin
26d53a9c89 crypto: powerpc - Use address generation helper for asm
Replace open-coded toc-relative address calculation with helper macros,
commit dab3b8f4fd ("powerpc/64: asm use consistent global variable
declaration and access") made similar conversions already but missed
this one.

This allows data addressing model to be changed more easily.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-2-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Frederic Barrat
e64e71056f powerpc/powernv/ioda: Skip unallocated resources when mapping to PE
pnv_ioda_setup_pe_res() calls opal to map a resource with a PE. However,
the code assumes the resource is allocated and it uses the resource
address to find out the segment(s) which need to be mapped to the
PE. In the unlikely case where the resource hasn't been allocated, the
computation for the segment number is garbage, which can lead to
invalid memory access and potentially a kernel crash, such as:

[ ] pci_bus 0002:02: Configuring PE for bus
[ ] pci 0002:02     : [PE# fc] Secondary bus 0x0000000000000002..0x0000000000000002 associated with PE#fc
[ ] BUG: Kernel NULL pointer dereference on write at 0x00000000
[ ] Faulting instruction address: 0xc00000000005eac4
[ ] Oops: Kernel access of bad area, sig: 7 [#1]
[ ] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[ ] Modules linked in:
[ ] CPU: 12 PID: 1 Comm: swapper/20 Not tainted 5.10.50-openpower1 #2
[ ] NIP:  c00000000005eac4 LR: c00000000005ea44 CTR: 0000000030061b9c
[ ] REGS: c000200007383650 TRAP: 0300   Not tainted  (5.10.50-openpower1)
[ ] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44000224  XER: 20040000
[ ] CFAR: c00000000005eaa0 DAR: 0000000000000000 DSISR: 02080000 IRQMASK: 0
[ ] GPR00: c00000000005dd98 c0002000073838e0 c00000000185de00 c000200fff018960
[ ] GPR04: 00000000000000fc 0000000000000003 0000000000000000 0000000000000000
[ ] GPR08: 0000000000000000 0000000000000000 0000000000000000 9000000000001033
[ ] GPR12: 0000000031cb0000 c000000ffffe6a80 c000000000010a58 0000000000000000
[ ] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ ] GPR20: 0000000000000000 0000000000000000 0000000000000000 c00000000711e200
[ ] GPR24: 0000000000000100 c000200009501120 c00020000cee2800 00000000000003ff
[ ] GPR28: c000200fff018960 0000000000000000 c000200ffcb7fd00 0000000000000000
[ ] NIP [c00000000005eac4] pnv_ioda_setup_pe_res+0x94/0x1a0
[ ] LR [c00000000005ea44] pnv_ioda_setup_pe_res+0x14/0x1a0
[ ] Call Trace:
[ ] [c0002000073838e0] [c00000000005eb98] pnv_ioda_setup_pe_res+0x168/0x1a0 (unreliable)
[ ] [c000200007383970] [c00000000005dd98] pnv_pci_ioda_dma_dev_setup+0x43c/0x970
[ ] [c000200007383a60] [c000000000032cdc] pcibios_bus_add_device+0x78/0x18c
[ ] [c000200007383aa0] [c00000000028f2bc] pci_bus_add_device+0x28/0xbc
[ ] [c000200007383b10] [c00000000028f3a0] pci_bus_add_devices+0x50/0x7c
[ ] [c000200007383b50] [c00000000028f3c4] pci_bus_add_devices+0x74/0x7c
[ ] [c000200007383b90] [c00000000028f3c4] pci_bus_add_devices+0x74/0x7c
[ ] [c000200007383bd0] [c00000000069ad0c] pcibios_init+0xf0/0x104
[ ] [c000200007383c50] [c0000000000106d8] do_one_initcall+0x84/0x1c4
[ ] [c000200007383d20] [c0000000006910b8] kernel_init_freeable+0x264/0x268
[ ] [c000200007383dc0] [c000000000010a68] kernel_init+0x18/0x138
[ ] [c000200007383e20] [c00000000000cbfc] ret_from_kernel_thread+0x5c/0x80
[ ] Instruction dump:
[ ] 7f89e840 409d000c 7fbbf840 409c000c 38210090 4848f448 809c002c e95e0120
[ ] 7ba91764 38a00003 57a7043e 38c00000 <7c8a492e> 5484043e e87e0018 4bff23bd

Hitting the problem is not that easy. It was seen with a (semi-bogus)
PCI device with a class code of 0. The generic PCI framework doesn't
allocate resources in such a case.

The patch is simply skipping resources which are still flagged with
IORESOURCE_UNSET.

We don't have the problem with 64-bit mem resources, as the address of
the resource is checked to be within the range of the 64-bit mmio
window. See pnv_ioda_reserve_dev_m64_pe() and pnv_pci_is_m64().

Reported-by: Andrew Jeffery <andrew@aj.id.au>
Fixes: 23e79425fe ("powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230120093215.19496-1-fbarrat@linux.ibm.com
2023-02-10 22:17:36 +11:00
Nicholas Piggin
01f135506e powerpc/32: select HAVE_VIRT_CPU_ACCOUNTING_GEN
cputime_t is no longer a type, so VIRT_CPU_ACCOUNTING_GEN does not
have any affect on the type for 32-bit architectures, so there is
no reason it can't be supported.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095805.2823731-4-npiggin@gmail.com
2023-02-10 22:17:36 +11:00
Nicholas Piggin
5c4b710a81 powerpc/32: implement HAVE_CONTEXT_TRACKING_USER support
Context tracking involves tracking user, kernel, guest switches. 32-bit
shares interrupt and syscall entry and exit code (and context tracking
calls) with 64-bit, and KVM can not be selected if CONTEXT_TRACKING_USER
is enabled, so context tracking can be enabled for 32-bit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095805.2823731-3-npiggin@gmail.com
2023-02-10 22:17:35 +11:00
Nicholas Piggin
fb3b72a3f4 powerpc: Consolidate 32-bit and 64-bit interrupt_enter_prepare
There are two separeate implementations for 32-bit and 64-bit which
mostly do the same thing. Consolidating on one implementation ends
up being smaller and simpler, there is just irq soft-mask reconcile
that is specific to 64-bit.

There should be no real functional change with this patch, but it
does make the context tracking calls necessary for 32-bit to support
context tracking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121095805.2823731-2-npiggin@gmail.com
2023-02-10 22:17:35 +11:00
Kajol Jain
60bd7936f9 powerpc/hv-24x7: Fix pvr check when setting interface version
Commit ec3eb9d941 ("powerpc/perf: Use PVR rather than
oprofile field to determine CPU version") added usage
of pvr value instead of oprofile field to determine the
platform. In hv-24x7 pmu driver code, pvr check uses PVR_POWER8
when assigning the interface version for power8 platform.
But power8 can also have other pvr values like PVR_POWER8E and
PVR_POWER8NVL. Hence the interface version won't be set
properly incase of PVR_POWER8E and PVR_POWER8NVL.
Fix this issue by adding the checks for PVR_POWER8E and
PVR_POWER8NVL as well.

Fixes: ec3eb9d941 ("powerpc/perf: Use PVR rather than oprofile field to determine CPU version")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131184804.220756-1-kjain@linux.ibm.com
2023-02-10 22:17:35 +11:00
Christophe Leroy
19daf0aef8 powerpc/bpf/32: perform three operands ALU operations
When an ALU instruction is preceded by a MOV instruction
that just moves a source register into the destination register of
the ALU, replace that MOV+ALU instructions by an ALU operation
taking the source of the MOV as second source instead of using its
destination.

Before the change, code could look like the following, with
superfluous separate register move (mr) instructions.

  70:	7f c6 f3 78 	mr      r6,r30
  74:	7f a5 eb 78 	mr      r5,r29
  78:	30 c6 ff f4 	addic   r6,r6,-12
  7c:	7c a5 01 d4 	addme   r5,r5

With this commit, addition instructions take r30 and r29 directly.

  70:	30 de ff f4 	addic   r6,r30,-12
  74:	7c bd 01 d4 	addme   r5,r29

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b6719beaf01f9dcbcdbb787ef67c4a2f8e3a4cb6.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:35 +11:00
Christophe Leroy
c88da29b4d powerpc/bpf/32: introduce a second source register for ALU operations
At the time being, all ALU operation are performed with same L-source
and destination, requiring the L-source to be moved into destination via
a separate register move, like:

  70:	7f c6 f3 78 	mr      r6,r30
  74:	7f a5 eb 78 	mr      r5,r29
  78:	30 c6 ff f4 	addic   r6,r6,-12
  7c:	7c a5 01 d4 	addme   r5,r5

Introduce a second source register to all ALU operations. For the time
being that second source register is made equal to the destination
register.

That change will allow, via following patch, to optimise the generated
code as:

  70:	30 de ff f4 	addic   r6,r30,-12
  74:	7c bd 01 d4 	addme   r5,r29

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d5aaaba50d9d6b4a0e9f0cd4a5e34101aca1e247.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:35 +11:00
Christophe Leroy
8616045fe7 powerpc/bpf/32: Optimise some particular const operations
Simplify multiplications and divisions with constants when the
constant is 1 or -1.

When the constant is a power of 2, replace them by bit shits.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e53b1f4a4150ec6cabcaeeef82bf9c361b5f9204.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:35 +11:00
Christophe Leroy
d3921cbb6c powerpc/bpf: Only pad length-variable code at initial pass
Now that two real additional passes are performed in case of extra pass
requested by BPF core, padding is not needed anymore except during
initial pass done before memory allocation to count maximum possible
program size.

So, only do the padding when 'image' is NULL.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/921851d6577badc1e6b08b270a0ced80a6a26d03.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:35 +11:00
Christophe Leroy
85e031154c powerpc/bpf: Perform complete extra passes to update addresses
BPF core calls the jit compiler again for an extra pass in order
to properly set subprog addresses.

Unlike other architectures, powerpc only updates the addresses
during that extra pass. It means that holes must have been left
in the code in order to enable the maximum possible instruction
size.

In order to avoid waste of space, and waste of CPU time on powerpc
processors on which the NOP instruction is not 0-cycle, perform
two real additional passes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d484a4ac95949ff55fc4344b674e7c0d3ddbfcd5.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:35 +11:00
Christophe Leroy
7dd0e28487 powerpc/bpf/32: BPF prog is never called with more than one arg
BPF progs are never called with more than one argument, plus the
tail call count as a second argument when needed.

So, no need to retrieve 9th and 10th argument (5th 64 bits argument)
from the stack in prologue.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89a200fb45048601475c092c5775294dee3886de.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:35 +11:00
Christophe Leroy
d084dcf256 powerpc/bpf/32: Only set a stack frame when necessary
Until now a stack frame was set at all time due to the need
to keep tail call counter in the stack.

But since commit 89d21e259a ("powerpc/bpf/32: Fix Oops on tail call
tests") the tail call counter is passed via register r4. It is therefore
not necessary anymore to have a stack frame for that.

Just like PPC64, implement bpf_has_stack_frame() and only sets the frame
when needed.

The difference with PPC64 is that PPC32 doesn't have a redzone, so
the stack is required as soon as non volatile registers are used or
when tail call count is set up.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix commit reference in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62d7b654a3cfe73d998697cb29bbc5ffd89bfdb1.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:34 +11:00
Christophe Leroy
6376ed8fec powerpc/bpf/32: No need to zeroise r4 when not doing tail call
r4 is cleared at function entry and used as tail call count.

But when the function does not perform tail call, r4 is
ignored, so no need to clear it.

Replace it by a NOP in that case.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9c5440b2b6d90a78600257433ac499b5c5101fbb.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:34 +11:00
Christophe Leroy
d9ab6da64f powerpc: Remove __kernel_text_address() in show_instructions()
That test was introducted in 2006 by
commit 00ae36de49 ("[POWERPC] Better check in show_instructions").
At that time, there was no BPF progs.

As seen in message of commit 89d21e259a ("powerpc/bpf/32: Fix Oops
on tail call tests"), when a page fault occurs in test_bpf.ko for
instance, the code is dumped as XXXXXXXXs. Allthough
__kernel_text_address() checks is_bpf_text_address(), it seems it is
not enough.

Today, show_instructions() uses get_kernel_nofault() to read the code,
so there is no real need for additional verifications.

ARM64 and x86 don't do any additional check before dumping
instructions. Do the same and remove __kernel_text_address()
in show_instructions().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4fd69ef7945518c3e27f96b95046a5c1468d35bf.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10 22:17:34 +11:00
Ganesh Goudar
2115732e54 powerpc/mce: log the error for all unrecoverable errors
For all unrecoverable errors we are missing to log the
error, Since machine_check_log_err() is not getting called
for unrecoverable errors. machine_check_log_err() is called
from deferred handler, To run deferred handlers we have to do
irq work raise from the exception handler.

For recoverable errors exception vector code takes care of
running deferred handlers.

For unrecoverable errors raise irq work in save_mce_event(),
So that we log the error from MCE deferred handler.

Log without this change

 MCE: CPU27: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
 MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4]
 MCE: CPU27: Initiator CPU
 MCE: CPU27: Unknown

Log with this change

 MCE: CPU24: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
 MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48]
 MCE: CPU24: Initiator CPU
 MCE: CPU24: Unknown
 RTAS: event: 5, Type: Platform Error (224), Severity: 3

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230201095933.129482-1-ganeshgr@linux.ibm.com
2023-02-10 22:17:34 +11:00
Benjamin Gray
8d7253dc44 selftests/powerpc: Add automatically allocating read_file
A couple of tests roll their own auto-allocating file read logic.

Add a generic implementation and convert them to use it.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203003947.38033-6-bgray@linux.ibm.com
2023-02-09 23:56:45 +11:00
Benjamin Gray
5c20de5788 selftests/powerpc: Add {read,write}_{long,ulong}
Add helper functions to read and write (unsigned) long values directly
from/to files. One of the kernel interfaces uses hex strings, so we need
to allow passing a base too.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203003947.38033-5-bgray@linux.ibm.com
2023-02-09 23:56:45 +11:00
Benjamin Gray
d1bc05b7bf selftests/powerpc: Parse long/unsigned long value safely
Often a file is expected to hold an integral value. Existing functions
will use a C stdlib function like atoi or strtol to parse the file.
These operations are error prone, with complicated error conditions
(atoi returns 0 if not a number, and is undefined behaviour if not in
range. strtol returns 0 if not a number, and LONG_MIN/MAX if not in
range + sets errno to ERANGE).

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203003947.38033-4-bgray@linux.ibm.com
2023-02-09 23:56:45 +11:00
Benjamin Gray
121d340be9 selftests/powerpc: Add read/write debugfs file, int
Debugfs files are not always integers, so make *_file return/write a
byte buffer, and *_int deal with int values specifically. This increases
consistency with the other file read/write helpers.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203003947.38033-3-bgray@linux.ibm.com
2023-02-09 23:56:45 +11:00
Benjamin Gray
a974f0c131 selftests/powerpc: Add generic read/write file util
File read/write is reimplemented in about 5 different ways in the
various PowerPC selftests. This indicates it should be a common util.

Add a common read_file / write_file implementation and convert users
to it where (easily) possible.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203003947.38033-2-bgray@linux.ibm.com
2023-02-09 23:56:45 +11:00
Greg Kroah-Hartman
b505063910 powerpc/iommu: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230202141919.2298821-1-gregkh@linuxfoundation.org
2023-02-08 21:42:12 +11:00
Nicholas Piggin
dcfecb989a powerpc/64s/radix: Remove TLB_FLUSH_ALL test from range flushes
This looks like it came across from x86, but x86 uses TLB_FLUSH_ALL as
a parameter to internal functions. Powerpc never sets it anywhere.

Remove the associated logic and leave a warning for now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203111718.1149852-4-npiggin@gmail.com
2023-02-08 21:42:12 +11:00
Nicholas Piggin
d01dc25e47 powerpc/64s/radix: mm->context.id should always be valid
The MMU_NO_CONTEXT checks are an unnecessary complication. Make
these warn to prepare to remove them in future.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203111718.1149852-3-npiggin@gmail.com
2023-02-08 21:42:12 +11:00
Nicholas Piggin
45abf5d94b powerpc/64s/radix: Remove need_flush_all test from radix__tlb_flush
need_flush_all is only set by arch code to instruct generic tlb_flush
to flush all. It is never set by powerpc, so it can be removed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203111718.1149852-2-npiggin@gmail.com
2023-02-08 21:42:12 +11:00
Christophe Leroy
4b10306e98 powerpc: Disable CPU unknown by CLANG when CC_IS_CLANG
CLANG only knows the following CPUs:

generic, 440, 450, 601, 602, 603, 603e, 603ev, 604, 604e, 620, 630,
g3, 7400, g4, 7450, g4+, 750, 8548, 970, g5, a2, e500, e500mc, e5500,
power3, pwr3, power4, pwr4, power5, pwr5, power5x, pwr5x, power6,
pwr6, power6x, pwr6x, power7, pwr7, power8, pwr8, power9, pwr9,
power10, pwr10, powerpc, ppc, ppc32, powerpc64, ppc64, powerpc64le,
ppc64le, futur

Disable other ones when CC_IS_CLANG.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e62892e32c14a7a5738c597e39e0082cb0abf21c.1675335659.git.christophe.leroy@csgroup.eu
2023-02-08 21:42:12 +11:00
Pali Rohár
5d2eb73aa0 powerpc/pci: Add option for using pci_to_OF_bus_map
The "pci-OF-bus-map" property was declared deprecated in 2006 [1] and to
the best of everyone's knowledge is not used by anything anymore [2].

The creation of the property was disabled on powermac (arch/powerpc) in
2005 by commit 35499c0195 ("powerpc: Merge in 64-bit powermac
support."). But it is still created by default on CHRP.

On powermac the actual map (pci_to_OF_bus_map) is still used by default,
even though the device tree property is not created.

Add an option to enable/disable use of the pci_to_OF_bus_map, and
creation of the property (on CHRP).

Disabling the option allows enabling CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT
which allows "normal" bus numbering and more than 256 buses, like 64-bit
and other architectures.

Mark the new option as default n, the intention is that the option and
the code will be removed in a future release.

[1]: https://lore.kernel.org/linuxppc-dev/1148016268.13249.14.camel@localhost.localdomain/
[2]: https://lore.kernel.org/all/575f239205e8635add81c9f902b7d9db7beb83ea.camel@kernel.crashing.org/

Signed-off-by: Pali Rohár <pali@kernel.org>
[mpe: Reword commit & help text, shrink option name, rework to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206113902.1857123-1-mpe@ellerman.id.au
2023-02-07 20:15:23 +11:00
Nicholas Piggin
2ea31e2e62 powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
The RFI and STF security mitigation options can flip the
interrupt_exit_not_reentrant static branch condition concurrently with
the interrupt exit code which tests that branch.

Interrupt exit tests this condition to set MSR[EE|RI] for exit, then
again in the case a soft-masked interrupt is found pending, to recover
the MSR so the interrupt can be replayed before attempting to exit
again. If the condition changes between these two tests, the MSR and irq
soft-mask state will become corrupted, leading to warnings and possible
crashes. For example, if the branch is initially true then false,
MSR[EE] will be 0 but PACA_IRQ_HARD_DIS clear and EE may not get
enabled, leading to warnings in irq_64.c.

Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206042240.92103-1-npiggin@gmail.com
2023-02-07 10:13:33 +11:00
Randy Dunlap
97e45d469e powerpc/kexec_file: fix implicit decl error
kexec (PPC64) code calls memory_hotplug_max(). Add the header
declaration for it from <asm/mmzone.h>. Using <linux/mmzone.h> does not
work since the #include for <asm/mmzone.h> depends on CONFIG_NUMA=y,
which is not always set.

Fixes this build error/warning:

  arch/powerpc/kexec/file_load_64.c: In function 'kexec_extra_fdt_size_ppc64':
  arch/powerpc/kexec/file_load_64.c:993:33: error: implicit declaration of function 'memory_hotplug_max'
  993 |                 usm_entries = ((memory_hotplug_max() / drmem_lmb_size()) +
      |                                 ^~~~~~~~~~~~~~~~~~

Fixes: fc546faa55 ("powerpc/kexec_file: Count hot-pluggable memory in FDT estimate")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230204172206.7662-1-rdunlap@infradead.org
2023-02-06 22:00:30 +11:00
Michael Ellerman
e33416fca8 powerpc: Don't select ARCH_WANTS_NO_INSTR
Commit 41b7a347bf ("powerpc: Book3S 64-bit outline-only KASAN
support") added a select of ARCH_WANTS_NO_INSTR, because it also added
some uses of noinstr. However noinstr is always defined, regardless of
ARCH_WANTS_NO_INSTR, so there's no need to select it just for that.

As PeterZ says [1]:
  Note that by selecting ARCH_WANTS_NO_INSTR you effectively state to
  abide by its rules.

As of now the powerpc code does not abide by those rules, and trips some
new warnings added by Peter in linux-next.

So until the code can be fixed to avoid those warnings, disable
ARCH_WANTS_NO_INSTR.

Note that ARCH_WANTS_NO_INSTR is also used to gate building KCOV and
parts of KCSAN. However none of the noinstr annotations in powerpc were
added for KCOV or KCSAN, instead instrumentation is blocked at the file
level using KCOV_INSTRUMENT_foo.o := n.

[1]: https://lore.kernel.org/linuxppc-dev/Y9t6yoafrO5YqVgM@hirez.programming.kicks-ass.net

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2023-02-02 21:27:35 +11:00
Michael Ellerman
1665c027af powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
Commit baf1ed24b2 ("powerpc/mm: Remove empty hash__ functions")
removed some empty hash MMU flushing routines, but got a bit overeager
and also removed the call to hash__tlb_flush() from tlb_flush().

In regular use this doesn't lead to any noticable breakage, which is a
little concerning. Presumably there are flushes happening via other
paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck.

Fix it by reinstating the call to hash__tlb_flush().

Fixes: baf1ed24b2 ("powerpc/mm: Remove empty hash__ functions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131111407.806770-1-mpe@ellerman.id.au
2023-02-02 13:25:47 +11:00
Sourabh Jain
fc546faa55 powerpc/kexec_file: Count hot-pluggable memory in FDT estimate
On Systems where online memory is lesser compared to max memory, the
kexec_file_load system call may fail to load the kdump kernel with the
below errors:

    "Failed to update fdt with linux,drconf-usable-memory property"
    "Error setting up usable-memory property for kdump kernel"

This happens because the size estimation for usable memory properties
for the kdump kernel's FDT is based on the online memory whereas the
usable memory properties include max memory. In short, the hot-pluggable
memory is not accounted for while estimating the size of the usable
memory properties.

The issue is addressed by calculating usable memory property size using
max hotplug address instead of the last online memory address.

Fixes: 2377c92e37 ("powerpc/kexec_file: fix FDT size estimation for kdump kernel")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230131030615.729894-1-sourabhjain@linux.ibm.com
2023-02-01 13:04:40 +11:00
Michael Ellerman
111bcb3738 powerpc/64s/radix: Fix RWX mapping with relocated kernel
If a relocatable kernel is loaded at a non-zero address and told not to
relocate to zero (kdump or RELOCATABLE_TEST), the mapping of the
interrupt code at zero is left with RWX permissions.

That is a security weakness, and leads to a warning at boot if
CONFIG_DEBUG_WX is enabled:

  powerpc/mm: Found insecure W+X mapping at address 00000000056435bc/0xc000000000000000
  WARNING: CPU: 1 PID: 1 at arch/powerpc/mm/ptdump/ptdump.c:193 note_page+0x484/0x4c0
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc1-00001-g8ae8e98aea82-dirty #175
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-dd0dca hv:linux,kvm pSeries
  NIP:  c0000000004a1c34 LR: c0000000004a1c30 CTR: 0000000000000000
  REGS: c000000003503770 TRAP: 0700   Not tainted  (6.2.0-rc1-00001-g8ae8e98aea82-dirty)
  MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24000220  XER: 00000000
  CFAR: c000000000545a58 IRQMASK: 0
  ...
  NIP note_page+0x484/0x4c0
  LR  note_page+0x480/0x4c0
  Call Trace:
    note_page+0x480/0x4c0 (unreliable)
    ptdump_pmd_entry+0xc8/0x100
    walk_pgd_range+0x618/0xab0
    walk_page_range_novma+0x74/0xc0
    ptdump_walk_pgd+0x98/0x170
    ptdump_check_wx+0x94/0x100
    mark_rodata_ro+0x30/0x70
    kernel_init+0x78/0x1a0
    ret_from_kernel_thread+0x5c/0x64

The fix has two parts. Firstly the pages from zero up to the end of
interrupts need to be marked read-only, so that they are left with R-X
permissions. Secondly the mapping logic needs to be taught to ensure
there is a page boundary at the end of the interrupt region, so that the
permission change only applies to the interrupt text, and not the region
following it.

Fixes: c55d7b5e64 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-2-mpe@ellerman.id.au
2023-01-31 21:37:39 +11:00
Michael Ellerman
98d0219e04 powerpc/64s/radix: Fix crash with unaligned relocated kernel
If a relocatable kernel is loaded at an address that is not 2MB aligned
and told not to relocate to zero, the kernel can crash due to
mark_rodata_ro() incorrectly changing some read-write data to read-only.

Scenarios where the misalignment can occur are when the kernel is
loaded by kdump or using the RELOCATABLE_TEST config option.

Example crash with the kernel loaded at 5MB:

  Run /sbin/init as init process
  BUG: Unable to handle kernel data access on write at 0xc000000000452000
  Faulting instruction address: 0xc0000000005b6730
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
  NIP:  c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
  REGS: c000000004503250 TRAP: 0300   Not tainted  (6.2.0-rc1-00011-g349188be4841)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44288480  XER: 00000000
  CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
  ...
  NIP memset+0x68/0x104
  LR  zero_user_segments.constprop.0+0xa8/0xf0
  Call Trace:
    ext4_mpage_readpages+0x7f8/0x830
    ext4_readahead+0x48/0x60
    read_pages+0xb8/0x380
    page_cache_ra_unbounded+0x19c/0x250
    filemap_fault+0x58c/0xae0
    __do_fault+0x60/0x100
    __handle_mm_fault+0x1230/0x1a40
    handle_mm_fault+0x120/0x300
    ___do_page_fault+0x20c/0xa80
    do_page_fault+0x30/0xc0
    data_access_common_virt+0x210/0x220

This happens because mark_rodata_ro() tries to change permissions on the
range _stext..__end_rodata, but _stext sits in the middle of the 2MB
page from 4MB to 6MB:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)

The logic that changes the permissions assumes the linear mapping was
split correctly at boot, so it marks the entire 2MB page read-only. That
leads to the write fault above.

To fix it, the boot time mapping logic needs to consider that if the
kernel is running at a non-zero address then _stext is a boundary where
it must split the mapping.

That leads to the mapping being split correctly, allowing the rodata
permission change to take happen correctly, with no spillover:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
  radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
  radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)

If the kernel is loaded at a 2MB aligned address, the mapping continues
to use 2MB pages as before:

  radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
  radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages

Fixes: c55d7b5e64 ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
2023-01-31 21:37:39 +11:00
Michael Ellerman
7294194b47 powerpc/kexec_file: Fix division by zero in extra size estimation
In kexec_extra_fdt_size_ppc64() there's logic to estimate how much
extra space will be needed in the device tree for some memory related
properties.

That logic uses the size of RAM divided by drmem_lmb_size() to do the
estimation. However drmem_lmb_size() can be zero if the machine has no
hotpluggable memory configured, which is the case when booting with qemu
and no maxmem=x parameter is passed (the default).

The division by zero is reported by UBSAN, and can also lead to an
overflow and a warning from kvmalloc, and kdump kernel loading fails:

  WARNING: CPU: 0 PID: 133 at mm/util.c:596 kvmalloc_node+0x15c/0x160
  Modules linked in:
  CPU: 0 PID: 133 Comm: kexec Not tainted 6.2.0-rc5-03455-g07358bd97810 #223
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-dd0dca pSeries
  NIP:  c00000000041ff4c LR: c00000000041fe58 CTR: 0000000000000000
  REGS: c0000000096ef750 TRAP: 0700   Not tainted  (6.2.0-rc5-03455-g07358bd97810)
  MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24248242  XER: 2004011e
  CFAR: c00000000041fed0 IRQMASK: 0
  ...
  NIP kvmalloc_node+0x15c/0x160
  LR  kvmalloc_node+0x68/0x160
  Call Trace:
    kvmalloc_node+0x68/0x160 (unreliable)
    of_kexec_alloc_and_setup_fdt+0xb8/0x7d0
    elf64_load+0x25c/0x4a0
    kexec_image_load_default+0x58/0x80
    sys_kexec_file_load+0x5c0/0x920
    system_call_exception+0x128/0x330
    system_call_vectored_common+0x15c/0x2ec

To fix it, skip the calculation if drmem_lmb_size() is zero.

Fixes: 2377c92e37 ("powerpc/kexec_file: fix FDT size estimation for kdump kernel")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230130014707.541110-1-mpe@ellerman.id.au
2023-01-31 21:37:36 +11:00