Commit graph

1072691 commits

Author SHA1 Message Date
Haren Myneni
b903737bc5 powerpc/pseries/vas: sysfs interface to export capabilities
The hypervisor provides the available VAS GZIP capabilities such
as default or QoS window type and the target available credits in
each type. This patch creates sysfs entries and exports the target,
used and the available credits for each feature.

This interface can be used by the user space to determine the credits
usage or to set the target credits in the case of QoS type (for DLPAR).

/sys/devices/vas/vas0/gzip/default_capabilities (default GZIP capabilities)
	nr_total_credits /* Total credits available. Can be
			 /* changed with DLPAR operation */
	nr_used_credits  /* Used credits */

/sys/devices/vas/vas0/gzip/qos_capabilities (QoS GZIP capabilities)
	nr_total_credits
	nr_used_credits

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/702d8b626ebfac2b52f4995eebeafe1c9a6fcb75.camel@linux.ibm.com
2022-03-08 00:04:56 +11:00
Haren Myneni
c656cfe571 powerpc/pseries/vas: Reopen windows with DLPAR core add
VAS windows can be closed in the hypervisor due to lost credits
when the core is removed and the kernel gets fault for NX
requests on these inactive windows. If the NX requests are
issued on these inactive windows, OS gets page faults and the
paste failure will be returned to the user space. If the lost
credits are available later with core add, reopen these windows
and set them active. Later when the OS sees page faults on these
active windows, it creates mapping on the new paste address.
Then the user space can continue to use these windows and send
HW compression requests to NX successfully.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9f360e21355e6826142c81146acfa9b60bc7ecc.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
8ef7b9e176 powerpc/pseries/vas: Close windows with DLPAR core removal
The hypervisor assigns vas credits (windows) for each LPAR based
on the number of cores configured in that system. The OS is
expected to release credits when cores are removed, and may
allocate more when cores are added. So there is a possibility of
using excessive credits (windows) in the LPAR and the hypervisor
expects the system to close the excessive windows so that NX load
can be equally distributed across all LPARs in the system.

When the OS closes the excessive windows in the hypervisor,
it sets the window status inactive and invalidates window
virtual address mapping. The user space receives paste instruction
failure if any NX requests are issued on the inactive window.
Then the user space can use with the available open windows or
retry NX requests until this window active again.

This patch also adds the notifier for core removal/add to close
windows in the hypervisor if the system lost credits (core
removal) and reopen windows in the hypervisor when the previously
lost credits are available.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/108928f9c00a48cc6a722315d482d07cf66acf5a.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
6a8d4ca891 powerpc/vas: Map paste address only if window is active
The paste address mapping is done with mmap() after the window is
opened with ioctl. The partition has to close VAS windows in the
hypervisor if it lost credits due to DLPAR core removal. But the
kernel marks these windows inactive until the previously lost
credits are available later. If the window is inactive due to
DLPAR after this mmap(), the paste instruction returns failure
until the the OS reopens this window again.

Before the user space issuing mmap(), there is a possibility of
happening DLPAR core removal event which causes the corresponding
window inactive. So if the window is not active, return mmap()
failure with -EACCES and expects the user space reissue mmap()
when the window is active or open a new window when the credit
is available.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbb203c26b324534e25658cb1dbbcb5160a2f93a.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
b5c63d90cc powerpc/vas: Return paste instruction failure if no active window
The VAS window may not be active if the system looses credits and
the NX generates page fault when it receives request on unmap
paste address.

The kernel handles the fault by remap new paste address if the
window is active again, Otherwise return the paste instruction
failure if the executed instruction that caused the fault was
a paste.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/492b9aefd593061d51dda67ee4d2fc449c000dce.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
1fe3a33ba0 powerpc/vas: Add paste address mmap fault handler
The user space opens VAS windows and issues NX requests by pasting
CRB on the corresponding paste address mmap. When the system lost
credits due to core removal, the kernel has to close the window in
the hypervisor and make the window inactive by unmapping this paste
address. Also the OS has to handle NX request page faults if the user
space issue NX requests.

This handler maps the new paste address with the same VMA when the
window is active again (due to core add with DLPAR). Otherwise
returns paste failure.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3956e1c1fdfde69127055ff1c0256c7d71104030.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
976410cd2c powerpc/pseries/vas: Save PID in pseries_vas_window struct
The kernel sets the VAS window with PID when it is opened in
the hypervisor. During DLPAR operation, windows can be closed and
reopened in the hypervisor when the credit is available. So saves
this PID in pseries_vas_window struct when the window is opened
initially and reuse it later during DLPAR operation.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a57cbe6d292fe49ad55a0b49c5679d6a24d8fe73.camel@linux.ibm.com
2022-03-08 00:04:55 +11:00
Haren Myneni
40562fe4fa powerpc/pseries/vas: Use common names in VAS capability structure
nr_total/nr_used_credits provides credits usage to user space
via sysfs and the same interface can be used on PowerNV in
future. Changed with proper naming so that applicable on both
pseries and PowerNV.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f4313e9f198ee4f8d4fa4d015d8d1873e17851e6.camel@linux.ibm.com
2022-03-08 00:04:54 +11:00
Michael Ellerman
9ef78b6293 Merge branch 'topic/ppc-kvm' into next
Merge our topic branch containing powerpc KVM related commits.

Alexey Kardashevskiy (1):
      KVM: PPC: Merge powerpc's debugfs entry content into generic entry

Fabiano Rosas (9):
      KVM: PPC: Book3S HV: Stop returning internal values to userspace
      KVM: PPC: Fix vmx/vsx mixup in mmio emulation
      KVM: PPC: mmio: Reject instructions that access more than mmio.data size
      KVM: PPC: mmio: Return to guest after emulation failure
      KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure
      KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
      KVM: PPC: Book3S HV: Delay setting of kvm ops
      KVM: PPC: Book3S HV: Free allocated memory if module init fails
      KVM: PPC: Decrement module refcount if init_vm fails

Jason Wang (1):
      powerpc/kvm: no need to initialise statics to 0

Nour-eddine Taleb (1):
      KVM: PPC: Book3S HV: remove unnecessary casts
2022-03-08 00:02:29 +11:00
Michael Ellerman
4bc06c59f6 Merge branch 'topic/func-desc-lkdtm' into next
Merge a topic branch we are maintaining with some cross-architecture
changes to function descriptor handling and their use in LKDTM.

From Christophe's cover letter:

Fix LKDTM for PPC64/IA64/PARISC

PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work on those
three architectures because LKDTM messes up function descriptors with
functions.

This series does some cleanup in the three architectures and refactors
function descriptors so that it can then easily use it in a generic way
in LKDTM.
2022-03-07 23:34:32 +11:00
Nour-eddine Taleb
e40b38a41c KVM: PPC: Book3S HV: remove unnecessary casts
Remove unnecessary casts, from "void *" to "struct kvmppc_xics *"

Signed-off-by: Nour-eddine Taleb <kernel.noureddine@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303143416.201851-1-kernel.noureddine@gmail.com
2022-03-04 12:58:46 +11:00
Anders Roxell
8219d31eff powerpc/lib/sstep: Fix build errors with newer binutils
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

  {standard input}: Assembler messages:
  {standard input}:10576: Error: unrecognized opcode: `stbcx.'
  {standard input}:10680: Error: unrecognized opcode: `lharx'
  {standard input}:10694: Error: unrecognized opcode: `lbarx'

Rework to add assembler directives [1] around the instruction.  The
problem with this might be that we can trick a power6 into
single-stepping through an stbcx. for instance, and it will execute that
in kernel mode.

[1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Fixes: 350779a29f ("powerpc: Handle most loads and stores in instruction emulation code")
Cc: stable@vger.kernel.org # v4.14+
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224162215.3406642-3-anders.roxell@linaro.org
2022-03-01 23:51:09 +11:00
Anders Roxell
8667d0d64d powerpc: Fix build errors with newer binutils
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian
2.37.90.20220207) the following build error shows up:

  {standard input}: Assembler messages:
  {standard input}:1190: Error: unrecognized opcode: `stbcix'
  {standard input}:1433: Error: unrecognized opcode: `lwzcix'
  {standard input}:1453: Error: unrecognized opcode: `stbcix'
  {standard input}:1460: Error: unrecognized opcode: `stwcix'
  {standard input}:1596: Error: unrecognized opcode: `stbcix'
  ...

Rework to add assembler directives [1] around the instruction. Going
through them one by one shows that the changes should be safe.  Like
__get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(),
which according to the name is specific to power9.  And __raw_rm_read*()
are only called in things that are powernv or book3s_hv specific.

[1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo

Cc: stable@vger.kernel.org
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Make commit subject more descriptive]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224162215.3406642-2-anders.roxell@linaro.org
2022-03-01 23:51:08 +11:00
Anders Roxell
a633cb1edd powerpc/lib/sstep: Fix 'sthcx' instruction
Looks like there been a copy paste mistake when added the instruction
'stbcx' twice and one was probably meant to be 'sthcx'. Changing to
'sthcx' from 'stbcx'.

Fixes: 350779a29f ("powerpc: Handle most loads and stores in instruction emulation code")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224162215.3406642-1-anders.roxell@linaro.org
2022-03-01 23:51:08 +11:00
Michael Ellerman
2863dd2db2 powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit
When CONFIG_GENERIC_CPU=y (true for all our defconfigs) we pass
-mcpu=powerpc64 to the compiler, even when we're building a 32-bit
kernel.

This happens because we have an ifdef CONFIG_PPC_BOOK3S_64/else block in
the Makefile that was written before 32-bit supported GENERIC_CPU. Prior
to that the else block only applied to 64-bit Book3E.

The GCC man page says -mcpu=powerpc64 "[specifies] a pure ... 64-bit big
endian PowerPC ... architecture machine [type], with an appropriate,
generic processor model assumed for scheduling purposes."

It's unclear how that interacts with -m32, which we are also passing,
although obviously -m32 is taking precedence in some sense, as the
32-bit kernel only contains 32-bit instructions.

This was noticed by inspection, not via any bug reports, but it does
affect code generation. Comparing before/after code generation, there
are some changes to instruction scheduling, and the after case (with
-mcpu=powerpc64 removed) the compiler seems more keen to use r8.

Fix it by making the else case only apply to Book3E 64, which excludes
32-bit.

Fixes: 0e00a8c9fd ("powerpc: Allow CPU selection also on PPC32")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220215112858.304779-1-mpe@ellerman.id.au
2022-03-01 23:51:04 +11:00
Daniel Henrique Barboza
749ed4a206 powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()
Executing node_set_online() when nid = NUMA_NO_NODE results in an
undefined behavior. node_set_online() will call node_set_state(), into
__node_set(), into set_bit(), and since NUMA_NO_NODE is -1 we'll end up
doing a negative shift operation inside
arch/powerpc/include/asm/bitops.h. This potential UB was detected
running a kernel with CONFIG_UBSAN.

The behavior was introduced by commit 10f78fd0da ("powerpc/numa: Fix a
regression on memoryless node 0"), where the check for nid > 0 was
removed to fix a problem that was happening with nid = 0, but the result
is that now we're trying to online NUMA_NO_NODE nids as well.

Checking for nid >= 0 will allow node 0 to be onlined while avoiding
this UB with NUMA_NO_NODE.

Fixes: 10f78fd0da ("powerpc/numa: Fix a regression on memoryless node 0")
Reported-by: Ping Fang <pifang@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220224182312.1012527-1-danielhb413@gmail.com
2022-03-01 23:41:01 +11:00
Christophe Leroy
973e2e6462 powerpc/interrupt: Remove struct interrupt_state
Since commit ceff77efa4 ("powerpc/64e/interrupt: Use new interrupt
context tracking scheme") struct interrupt_state has been empty and
unused.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1d862ce3eab3da6ca7ac47d4a78a18f154462511.1645806970.git.christophe.leroy@csgroup.eu
2022-03-01 23:41:00 +11:00
Hari Bathini
607451ce0a powerpc/fadump: register for fadump as early as possible
Crash recovery (fadump) is setup in the userspace by some service. This
service rebuilds initrd with dump capture capability, if it is not
already dump capture capable before proceeding to register for firmware
assisted dump (echo 1 > /sys/kernel/fadump/registered). But arming the
kernel with crash recovery support does not have to wait for userspace
configuration. So, register for fadump while setting it up itself. This
can at worst lead to a scenario, where /proc/vmcore is ready afer crash
but the initrd does not know how/where to offload it, which is always
better than not having a /proc/vmcore at all due to incomplete
configuration in the userspace at the time of crash.

Commit 0823c68b05 ("powerpc/fadump: re-register firmware-assisted dump
if already registered") ensures this change does not break userspace.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[mpe: Reword comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220201105305.155511-1-hbathini@linux.ibm.com
2022-03-01 23:41:00 +11:00
Kajol Jain
29cf373c57 selftests/powerpc/pmu: Add interface test for mmcra register fields
The testcase uses event code 0x35340401e0 to verify the settings for
different fields in Monitor Mode Control Register A (MMCRA). The fields
include thresh_start, thresh_stop thresh_select, sdar mode, sample and
marked bit. Checks if these fields are translated correctly via perf
interface to MMCRA.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-21-kjain@linux.ibm.com
2022-03-01 23:40:58 +11:00
Kajol Jain
02f02feb6b selftests/powerpc/pmu/: Add interface test for mmcr3_src fields
The testcase uses event code 0x1340000001c040 to verify the settings for
different src fields in Monitor Mode Control Register 3 (MMCR3). Checks
if these fields are translated correctly via perf interface to MMCR3 on
ISA v3.1 platform.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-20-kjain@linux.ibm.com
2022-03-01 23:40:47 +11:00
Madhavan Srinivasan
9ee241f1b1 selftests/powerpc/pmu/: Add interface test for mmcr2_fcs_fch fields
The testcases uses cycles event to verify the freeze counter settings in
Monitor Mode Control Register 2 (MMCR2). Event modifier (exclude_kernel)
setting is used for the event attribute to check the FCxS and FCxH (
Freeze counter in privileged and hypervisor state ) settings via perf
interface.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Add error checking, check MSR for MSR_HV, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-19-kjain@linux.ibm.com
2022-03-01 23:40:36 +11:00
Madhavan Srinivasan
ac575b2606 selftests/powerpc/pmu/: Add interface test for mmcr2_l2l3 field
The testcases uses event code 0x010000046080 to verify the l2l3 bit
setting for Monitor Mode Control Register 2 (MMCR2). check if this bit
is set correctly via perf interface in ISA v3.1 platform.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-18-kjain@linux.ibm.com
2022-03-01 23:40:26 +11:00
Athira Rajeev
2becea3b6a selftests/powerpc/pmu/: Add interface test for mmcr1_comb field
The testcase uses event code "0x26880" to verify the settings for
different fields in Monitor Mode Control Register 1 (MMCR1). The field
include PMCxCOMB. Checks if this field are translated correctly via perf
interface to MMCR1

Add selftest for mmcr1 comb field.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-16-kjain@linux.ibm.com
2022-03-01 23:40:16 +11:00
Athira Rajeev
6e11374b08 selftests/powerpc/pmu/: Add interface test for mmcr0_pmc56 using pmc5
The testcase uses event code 0x500fa to verify the FC5-6 bit setting in
Monitor Mode Control Register 0 (MMCR0). Check if FC5-6 bit is not set
in MMCR0 when using Performance Monitor Counter 5 and 6 (PMC5 and PMC6).

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-15-kjain@linux.ibm.com
2022-03-01 23:40:06 +11:00
Athira Rajeev
d5172f2585 selftests/powerpc/pmu/: Add interface test for mmcr0_fc56 field using pmc1
The testcase uses event code 0x1001e to verify two bit settings (FC5-6
and PMC1CE) in Monitor Mode Control Register 0 (MMCR0). Check if FC5-6
bit to be set in MMCR0 when not using Performance Monitor Counter 5 and
6 (PMC5 and PMC6). And also PMC1CE is expected to be set when using
PMC1. Test if these fields are programmed correctly via perf interface.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-14-kjain@linux.ibm.com
2022-03-01 23:39:54 +11:00
Athira Rajeev
9ac7c6d5e4 selftests/powerpc/pmu/: Add interface test for mmcr0_pmcjce field
The testcase uses event code 0x500fa ("instructions") to verify the
PMCjCE bit setting in Monitor Mode Control Register 0 (MMCR0). This bit
is expected to be set in MMCR0 when using Performance Monitor Counter
5 (PMC5). Checks if perf interface sets this bit correctly.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-13-kjain@linux.ibm.com
2022-03-01 23:39:41 +11:00
Athira Rajeev
b24142b9d2 selftests/powerpc/pmu/: Add interface test for mmcr0_pmccext bit
The testcase uses cycles event to check the PMCCEXT bit setting in
Monitor Mode Control Register 0 (MMCR0). Check if perf interface sets
this control bit in MMCR0 on ISA v3.1 platform.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-12-kjain@linux.ibm.com
2022-03-01 23:39:27 +11:00
Athira Rajeev
a7c0ab2e61 selftests/powerpc/pmu/: Add interface test for mmcr0_cc56run field
The testcase uses event code 0x500fa ("instructions") to check the
CC56RUN bit setting in Monitor Mode Control Register 0(MMCR0). In ISA
v3.1 platform, this bit is expected to be set in MMCR0 when using
Performance Monitor Counter 5 and 6 (PMC5 and PMC6). Verify this is done
correctly by perf interface.

CC56RUN bit makes PMC5 and PMC6 count regardless of the run latch state.
This bit is set in power10 since PMC5 and PMC6 is used in power10 for
counting instructions and cycles. Hence added a check to skip this test
in other platforms

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-11-kjain@linux.ibm.com
2022-03-01 23:39:09 +11:00
Athira Rajeev
eb7aa044df selftests/powerpc/pmu/: Add interface test for mmcr0 exception bits
The testcase uses "instructions" event to verify two bits(PMAE and PMAO)
in Monitor Mode Control Register 0 (MMCR0). At the time of interrupt,
pmae bit ( which enables performance monitor exception ) is expected to
be cleared and pmao (which indicates performance monitor alert) bit is
expected to be set in MMCR0. And testcases handles these checks.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-10-kjain@linux.ibm.com
2022-03-01 23:38:46 +11:00
Kajol Jain
13307f9584 selftests/powerpc/pmu: Add macro to extract mmcr3 and mmcra fields
Add macro and utility functions to fetch individual fields from Monitor
Mode Control Register 3(MMCR3)and Monitor Mode Control Register A(MMCRA)
PMU registers

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-9-kjain@linux.ibm.com
2022-03-01 23:38:13 +11:00
Athira Rajeev
2b49e64106 selftests/powerpc/pmu: Add macro to extract mmcr0/mmcr1 fields
Add macro and utility functions to fetch individual fields from Monitor
Mode Control Register 0(MMCR0) and Monitor Mode Control Register
1(MMCR1) PMU register.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-8-kjain@linux.ibm.com
2022-03-01 23:38:13 +11:00
Madhavan Srinivasan
79c4e6aba8 selftests/powerpc/pmu: Add macros to extract mmcr fields
Along with it, Add macros and utility functions to fetch individual
fields from Monitor Mode Control Register 2(MMCR2) register.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-7-kjain@linux.ibm.com
2022-03-01 23:38:13 +11:00
Madhavan Srinivasan
54d4ba7f22 selftests/powerpc/pmu: Add event_init_sampling function
Extended event_init_opts() to include initialization of sampling
testcases. Patch adds an event_init_sampling() wrapper to initialize
event attribute fields for sampling events. This includes initializing
sample period, sample type and event type.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-6-kjain@linux.ibm.com
2022-03-01 23:38:13 +11:00
Kajol Jain
5f6c3061af selftests/powerpc/pmu: Add utility functions to post process the mmap buffer
Add couple of basic utility functions to post process the mmap buffer.
It includes function to read the total number of samples present in the
mmap buffer and function to get the address of the first sample.

Add function "get_intr_regs" which will return pointer to interrupt
registers present in the sample, incase sample type
PERF_SAMPLE_REGS_INTR is set.

Add functions "get_reg_value" which can be used to read any interrupt
register value from a given sample.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-5-kjain@linux.ibm.com
2022-03-01 23:38:13 +11:00
Madhavan Srinivasan
6523dce862 selftests/powerpc/pmu: Add macros to parse event codes
Each platform has raw event encoding format which specifies the bit
positions for different fields. The fields from event code gets
translated into performance monitoring mode control register (MMCRx)
settings. Patch add macros to extract individual fields from the event
code.

Add functions for sanity checks, since testcases currently are only
supported in power9 and power10.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Read PVR directly rather than using /proc/cpuinfo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-4-kjain@linux.ibm.com
2022-03-01 23:38:12 +11:00
Athira Rajeev
c315669e2f selftests/powerpc/pmu: Add support for perf sampling tests
Add support functions for enabling perf sampling test in a new folder
"sampling_tests" under "selftests/powerpc/pmu". This includes support
functions for allocating and processing the mmap buffer. These functions
are added/defined in "sampling_tests/misc.*" files.

Also updates the corresponding Makefiles in "selftests/powerpc" and
"sampling_tests" folder.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Drop unneeded bits from the Makefile]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-3-kjain@linux.ibm.com
2022-03-01 23:37:49 +11:00
Athira Rajeev
f961e20f15 selftests/powerpc/pmu: Include mmap_buffer field as part of struct event
To enable the capturing of samples as part of perf event, add a new
field "mmap_buffer" to "struct event". This field is a place-holder for
sample collection

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220127072012.662451-2-kjain@linux.ibm.com
2022-02-28 11:25:52 +11:00
Guo Zhengkui
8a0edc72be powerpc/module_64: fix array_size.cocci warning
Fix following coccicheck warning:
./arch/powerpc/kernel/module_64.c:432:40-41: WARNING: Use ARRAY_SIZE.

ARRAY_SIZE(arr) is a macro provided by the kernel. It makes sure that arr
is an array, so it's safer than sizeof(arr) / sizeof(arr[0]) and more
standard.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220223075426.20939-1-guozhengkui@vivo.com
2022-02-24 17:53:55 +11:00
Nicholas Piggin
8b91cee5ea powerpc/64s/hash: Make hash faults work in NMI context
Hash faults are not resoved in NMI context, instead causing the access
to fail. This is done because perf interrupts can get backtraces
including walking the user stack, and taking a hash fault on those could
deadlock on the HPTE lock if the perf interrupt hits while the same HPTE
lock is being held by the hash fault code. The user-access for the stack
walking will notice the access failed and deal with that in the perf
code.

The reason to allow perf interrupts in is to better profile hash faults.

The problem with this is any hash fault on a kernel access that happens
in NMI context will crash, because kernel accesses must not fail.

Hard lockups, system reset, machine checks that access vmalloc space
including modules and including stack backtracing and symbol lookup in
modules, per-cpu data, etc could all run into this problem.

Fix this by disallowing perf interrupts in the hash fault code (the
direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs
to extend to the preload case). This simplifies the tricky logic in hash
faults and perf, at the cost of reduced profiling of hash faults.

perf can still latch addresses when interrupts are disabled, it just
won't get the stack trace at that point, so it would still find hot
spots, just sometimes with confusing stack chains.

An alternative could be to allow perf interrupts here but always do the
slowpath stack walk if we are in nmi context, but that slows down all
perf interrupt stack walking on hash though and it does not remove as
much tricky code.

Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com
2022-02-24 12:46:54 +11:00
Christophe Leroy
406a8c1d8f powerpc: Remove remaining stab codes
Following commit 1231816373 ("powerpc/32: Remove remaining .stabs
annotations"), stabs code are not used anymore.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d8b33342d7454f6ca4f368f5206896558dfa06f4.1645538722.git.christophe.leroy@csgroup.eu
2022-02-23 14:49:27 +11:00
Christophe Leroy
5e5a6c5441 lkdtm: Add a test for function descriptors protection
Add WRITE_OPD to check that you can't modify function
descriptors.

Gives the following result when function descriptors are
not protected:

	lkdtm: Performing direct entry WRITE_OPD
	lkdtm: attempting bad 16 bytes write at c00000000269b358
	lkdtm: FAIL: survived bad write
	lkdtm: do_nothing was hijacked!

Looks like a standard compiler barrier() is not enough to force
GCC to use the modified function descriptor. Had to add a fake empty
inline assembly to force GCC to reload the function descriptor.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7eeba50d16a35e9d799820e43304150225f20197.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:12 +11:00
Christophe Leroy
72a8643304 lkdtm: Fix execute_[user]_location()
execute_location() and execute_user_location() intent
to copy do_nothing() text and execute it at a new location.
However, at the time being it doesn't copy do_nothing() function
but do_nothing() function descriptor which still points to the
original text. So at the end it still executes do_nothing() at
its original location allthough using a copied function descriptor.

So, fix that by really copying do_nothing() text and build a new
function descriptor by copying do_nothing() function descriptor and
updating the target address with the new location.

Also fix the displayed addresses by dereferencing do_nothing()
function descriptor.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4055839683d8d643cd99be121f4767c7c611b970.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:12 +11:00
Christophe Leroy
b64913394f lkdtm: Really write into kernel text in WRITE_KERN
WRITE_KERN is supposed to overwrite some kernel text, namely
do_overwritten() function.

But at the time being it overwrites do_overwritten() function
descriptor, not function text.

Fix it by dereferencing the function descriptor to obtain
function text pointer. Export dereference_function_descriptor()
for when LKDTM is built as a module.

And make do_overwritten() noinline so that it is really
do_overwritten() which is called by lkdtm_WRITE_KERN().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/31e58eaffb5bc51c07d8d4891d1982100ade8cfc.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:12 +11:00
Christophe Leroy
69b420ed8f lkdtm: Force do_nothing() out of line
LKDTM tests display that the run do_nothing() at a given
address, but in reality do_nothing() is inlined into the
caller.

Force it out of line so that it really runs text at the
displayed address.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5dcf4d2088e6aca47ab3b4c6d5c0f7fa064e25a.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
e1478d8eaf asm-generic: Refactor dereference_[kernel]_function_descriptor()
dereference_function_descriptor() and
dereference_kernel_function_descriptor() are identical on the
three architectures implementing them.

Make them common and put them out-of-line in kernel/extable.c
which is one of the users and has similar type of functions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/449db09b2eba57f4ab05f80102a67d8675bc8bcd.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
0dc690e4ef asm-generic: Define 'func_desc_t' to commonly describe function descriptors
We have three architectures using function descriptors, each with its
own type and name.

Add a common typedef that can be used in generic code.

Also add a stub typedef for architecture without function descriptors,
to avoid a forest of #ifdefs.

It replaces the similar 'func_desc_t' previously defined in
arch/powerpc/kernel/module_64.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f1f91b142b3c1082bdc1586ce71c9bac1e75213c.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
a257cacc38 asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS
Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option
named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of
'dereference_function_descriptor' macro to know whether an
arch has function descriptors.

To limit churn in one of the following patches, use
an #ifdef/#else construct with empty first part
instead of an #ifndef in asm-generic/sections.h

On powerpc, make sure the config option matches the ABI used
by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2
when calling 'sparse' so that sparse sees the same piece of
code as GCC.

And include a helper to check whether an arch has function
descriptors or not : have_function_descriptors()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a0f11fb0ea74a3197bc44dd7ba25e53a24fd03d.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
41a88b4547 ia64: Rename 'ip' to 'addr' in 'struct fdesc'
There are three architectures with function descriptors, try to
have common names for the address they contain in order to
refactor some functions into generic functions later.

powerpc has 'entry'
ia64 has 'ip'
parisc has 'addr'

Vote for 'addr' and update 'struct fdesc' accordingly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65b73ac614e4c002c5819d40b42f6f426d2ee52b.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
2fd986377d powerpc: Prepare func_desc_t for refactorisation
In preparation of making func_desc_t generic, change the ELFv2
version to a struct containing 'addr' element.

This allows using single helpers common to ELFv1 and ELFv2 and
reduces the amount of #ifdef's

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5c36105e08b27b98450535bff48d71b690c19739.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00
Christophe Leroy
0a9c5ae279 powerpc: Remove 'struct ppc64_opd_entry'
'struct ppc64_opd_entry' doesn't belong to uapi/asm/elf.h

It was initially in module_64.c and commit 2d291e9027 ("Fix compile
failure with non modular builds") moved it into asm/elf.h

But it was by mistake added outside of __KERNEL__ section,
therefore commit c3617f7203 ("UAPI: (Scripted) Disintegrate
arch/powerpc/include/asm") moved it to uapi/asm/elf.h

Now that it is not used anymore by the kernel, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c309ccee65ec2e3802df7a7fe761d0a298584809.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16 23:25:11 +11:00