Commit graph

4521 commits

Author SHA1 Message Date
Aneesh Kumar K.V
738f964555 powerpc/mm: Use page fragments for allocation page table at PMD level
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:12 +10:00
Aneesh Kumar K.V
8a6c697b99 powerpc/mm: Implement helpers for pagetable fragment support at PMD level
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:12 +10:00
Aneesh Kumar K.V
0c4d268029 powerpc/book3s64/mm: Simplify the rcu callback for page table free
Instead of encoding shift in the table address, use an enumerated index value.
This allow us to do different things in the callback for pte and pmd.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:11 +10:00
Aneesh Kumar K.V
1c7ec8a40a powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragment
4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:11 +10:00
Aneesh Kumar K.V
702346768c powerpc/mm/nohash: Remove pte fragment dependency from nohash
Now that we have removed 64K page size support, the RCU page table free can
be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h
header similar to nohash 32. We could possibly merge 32 and 64 bit there. But
that is for a later patch

We also move the book3s specific handler to pgtable_book3s64.c. This will be
updated in a later patch to handle split pmd ptlock.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:11 +10:00
Aneesh Kumar K.V
7820856a4f powerpc/mm/book3e/64: Remove unsupported 64Kpage size from 64bit booke
We have in Kconfig

config PPC_64K_PAGES
	bool "64k page size"
	depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
	select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64

Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page
support code from 64bit nohash.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:10 +10:00
Bryant G. Ly
0eca353e7a misc: IBM Virtual Management Channel Driver (VMC)
This driver is a logical device which provides an
interface between the hypervisor and a management
partition. This interface is like a message
passing interface. This management partition
is intended to provide an alternative to HMC-based
system management.

VMC enables the Management LPAR to provide basic
logical partition functions:
- Logical Partition Configuration
- Boot, start, and stop actions for individual
  partitions
- Display of partition status
- Management of virtual Ethernet
- Management of virtual Storage
- Basic system management

This driver is to be used for the POWER Virtual
Management Channel Virtual Adapter on the PowerPC
platform. It provides a character device which
allows for both request/response and async message
support through the /dev/ibmvmc node.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Reviewed-by: Adam Reznechek <adreznec@linux.vnet.ibm.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Taylor Jakobson <tjakobs@us.ibm.com>
Tested-by: Brad Warrum <bwarrum@us.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-14 16:35:42 +02:00
Frederic Weisbecker
1321a5de1e softirq/powerpc: Switch to generic local_softirq_pending() implementation
Remove the ad-hoc implementation, the generic code now allows us not to
reinvent the wheel.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:25:28 +02:00
Al Viro
4c1481ae60 powerpc/syscalls: timer_create can be handle by perfectly normal COMPAT_SYS_SPU
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:16 +10:00
Al Viro
28b9c34aa6 powerpc/syscalls: kill ppc32_select()
it had always been pointless - compat_sys_select() sign-extends
the first argument just fine on its own.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Use COMPAT_SPU_NEW() to keep systbl_chk.sh happy]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:15 +10:00
Al Viro
4c392e6591 powerpc/syscalls: switch rtas(2) to SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Update sys_ni.c for s/ppc_rtas/sys_rtas/]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:14 +10:00
Al Viro
f3675644e1 powerpc/syscalls: signal_{32, 64} - switch to SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Fix sys_debug_setcontext() prototype to return long]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:13 +10:00
Michael Ellerman
8f2133cc0e powerpc/pseries: hcall_exit tracepoint retval should be signed
The hcall_exit() tracepoint has retval defined as unsigned long. That
leads to humours results like:

  bash-3686  [009] d..2   854.134094: hcall_entry: opcode=24
  bash-3686  [009] d..2   854.134095: hcall_exit: opcode=24 retval=18446744073709551609

It's normal for some hcalls to return negative values, displaying them
as unsigned isn't very helpful. So change it to signed.

  bash-3711  [001] d..2   471.691008: hcall_entry: opcode=24
  bash-3711  [001] d..2   471.691008: hcall_exit: opcode=24 retval=-7

Which can be more easily compared to H_NOT_FOUND in hvcall.h

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
2018-05-10 23:17:43 +10:00
Michael Ellerman
d86a4ba02f powerpc/pkeys: Drop private VM_PKEY definitions
Now that we've updated the generic headers to support 5 PKEY bits for
powerpc we don't need our own #defines in arch code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 19:46:10 +10:00
Michael Ellerman
dbec10e58d mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h
Consolidate the pkey handling by providing a common empty definition
of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n.

This also removes another entanglement of pkeys.h and
asm/mmu_context.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2018-05-09 11:50:41 +10:00
Ram Pai
5212213aa5 mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-09 00:35:33 +10:00
Michael Ellerman
6c0a8f6b5a powerpc/pseries: Fix CONFIG_NUMA=n build
The build is failing with CONFIG_NUMA=n and some compiler versions:

  arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
  hotplug-cpu.c:(.text+0x12c): undefined reference to `timed_topology_update'
  arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_cpu_remove':
  hotplug-cpu.c:(.text+0x400): undefined reference to `timed_topology_update'

Fix it by moving the empty version of timed_topology_update() into the
existing #ifdef block, which has the right guard of SPLPAR && NUMA.

Fixes: cee5405da4 ("powerpc/hotplug: Improve responsiveness of hotplug change")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-08 14:59:56 +10:00
Naveen N. Rao
edf6a2dfe3 powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix
Some syscall entry functions on powerpc are prefixed with
ppc_/ppc32_/ppc64_ rather than the usual sys_/__se_sys prefix. fork(),
clone(), swapcontext() are some examples of syscalls with such entry
points. We need to match against these names when initializing ftrace
syscall tracing.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:29:44 +10:00
Naveen N. Rao
0b7758aaf6 powerpc/trace/syscalls: Update syscall name matching logic
On powerpc64 ABIv1, we are enabling syscall tracing for only ~20
syscalls. This is due to commit e145242ea0 ("syscalls/core,
syscalls/x86: Clean up syscall stub naming convention") which has
changed the syscall entry wrapper prefix from "SyS" to "__se_sys".

Update the logic for ABIv1 to not just skip the initial dot, but also
the "__se_sys" prefix.

Fixes: commit e145242ea0 ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:29:30 +10:00
Michael Ellerman
c4ec1f0353 powerpc/64: Remove unused paca->soft_enabled
In commit 4e26bc4a4e ("powerpc/64: Rename soft_enabled to
irq_soft_mask") we renamed paca->soft_enabled. But then in commit
8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not
virtualized") we added it back. Oops. This happened because the two
patches were in flight at the same time and rebased vs each other
multiple times, and we missed it in review.

Fixes: 8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:22:26 +10:00
Christophe Leroy
d5808ffaec powerpc/nohash: Use IS_ENABLED() to simplify __set_pte_at()
By using IS_ENABLED() we can simplify __set_pte_at() by removing
redundant *ptep = pte.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 20:37:47 +10:00
Christophe Leroy
0d594aa75f powerpc/nohash: Remove _PAGE_BUSY
_PAGE_BUSY is always 0, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 20:37:46 +10:00
Christophe Leroy
45201c8794 powerpc/nohash: Remove hash related code from nohash headers.
When nohash and book3s header were split, some hash related stuff
remained in the nohash header. This patch removes them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Duplicate pte_young() to avoid circular header dependency]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 20:37:46 +10:00
Christoph Hellwig
325ef1857f PCI: remove PCI_DMA_BUS_IS_PHYS
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads.  Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: Jens Axboe <axboe@kernel.dk>
2018-05-07 07:15:41 +02:00
Hari Bathini
8597538712 powerpc/fadump: Do not use hugepages when fadump is active
FADump capture kernel boots in restricted memory environment preserving
the context of previous kernel to save vmcore. Supporting hugepages in
such environment makes things unnecessarily complicated, as hugepages
need memory set aside for them. This means most of the capture kernel's
memory is used in supporting hugepages. In most cases, this results in
out-of-memory issues while booting FADump capture kernel. But hugepages
are not of much use in capture kernel whose only job is to save vmcore.
So, disabling hugepages support, when fadump is active, is a reliable
solution for the out of memory issues. Introducing a flag variable to
disable HugeTLB support when fadump is active.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 23:09:25 +10:00
Michael Ellerman
0c0c52306f powerpc: Only support DYNAMIC_FTRACE not static
We've had dynamic ftrace support for over 9 years since Steve first
wrote it, all the distros use dynamic, and static is basically
untested these days, so drop support for static ftrace.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:29 +10:00
Naveen N. Rao
ae30cc05be powerpc64/ftrace: Implement support for ftrace_regs_caller()
With -mprofile-kernel, we always save the full register state in
ftrace_caller(). While this works, this is inefficient if we're not
interested in the register state, such as when we're using the function
tracer.

Rename the existing ftrace_caller() as ftrace_regs_caller() and provide
a simpler implementation for ftrace_caller() that is used when registers
are not required to be saved.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:29 +10:00
Naveen N. Rao
acd55b1005 powerpc64/ftrace: Add helpers to hard disable ftrace
Add some helpers to enable/disable ftrace through paca->ftrace_enabled.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:26 +10:00
Naveen N. Rao
c3e59d7784 powerpc64/ftrace: Rearrange #ifdef sections in ftrace.h
Re-arrange the last #ifdef section in preparation for a subsequent
change.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:25 +10:00
Naveen N. Rao
ea678ac627 powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths
We have some C code that we call into from real mode where we cannot
take any exceptions. Though the C functions themselves are mostly safe,
if these functions are traced, there is a possibility that we may take
an exception. For instance, in certain conditions, the ftrace code uses
WARN(), which uses a 'trap' to do its job.

For such scenarios, introduce a new field in paca 'ftrace_enabled',
which is checked on ftrace entry before continuing. This field can then
be set to zero to disable/pause ftrace, and set to a non-zero value to
resume ftrace.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:25 +10:00
Thomas Gleixner
604a98f1df Merge branch 'timers/urgent' into timers/core
Pick up urgent fixes to apply dependent cleanup patch
2018-05-02 16:11:12 +02:00
Eric W. Biederman
e821fa4245 signal/powerpc: Replace TRAP_FIXME with TRAP_UNK
Using an si_code of 0 that aliases with SI_USER is clearly the wrong
thing todo, and causes problems in interesting ways.

For use in unknown_exception the recently defined TRAP_UNK
semantically is a perfect fit.  For use in RunModeException it looks
like something more specific than TRAP_UNK could be used.  No one has
bothered to find a better fit than the broken si_code of 0 in all of
these years and I don't see an obvious better fit so TRAP_UNK is
switching RunModeException to return TRAP_UNK is clearly an
improvement.

Recent history suggests no actually cares about crazy corner
cases of the kernel behavior like this so I don't expect any
regressions from changing this.  However if something does
happen this change is easy to revert.

Though I wonder if SIGKILL might not be a better fit.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25 10:40:58 -05:00
Eric W. Biederman
aeb1c0f6ff signal/powerpc: Replace FPE_FIXME with FPE_FLTUNK
Using an si_code of 0 that aliases with SI_USER is clearly the
wrong thing todo, and causes problems in interesting ways.

The newly defined FPE_FLTUNK semantically appears to fit the
bill so use it instead.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:  linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25 10:40:55 -05:00
Alistair Popple
a1409adac7 powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters
There is a single npu context per set of callback parameters. Callers
should be prevented from overwriting existing callback values so
instead return an error if different parameters are passed.

Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com>
Tested-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24 09:46:57 +10:00
Arnd Bergmann
d0b67de998 y2038: powerpc: Extend sysvipc data structures
powerpc, uses a nonstandard variation of the generic sysvipc
data structures, intended to have the padding moved around
so it can deal with big-endian 32-bit user space that has
64-bit time_t.

powerpc has the same definition as parisc and sparc, but now also
supports little-endian mode, which is now wrong because the
padding is made for big-endian user space.

This takes just take the same approach here that we have for
the asm-generic headers and adds separate 32-bit fields for the
upper halves of the timestamps, to let libc deal with the mess
in user space.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-20 16:20:13 +02:00
Deepa Dinamani
0d55303c51 compat: Move compat_timespec/ timeval to compat_time.h
All the current architecture specific defines for these
are the same. Refactor these common defines to a common
header file.

The new common linux/compat_time.h is also useful as it
will eventually be used to hold all the defines that
are needed for compat time types that support non y2038
safe types. New architectures need not have to define these
new types as they will only use new y2038 safe syscalls.
This file can be deleted after y2038 when we stop supporting
non y2038 safe syscalls.

The patch also requires an operation similar to:

git grep "asm/compat\.h" | cut -d ":" -f 1 |  xargs -n 1 sed -i -e "s%asm/compat.h%linux/compat.h%g"

Cc: acme@kernel.org
Cc: benh@kernel.crashing.org
Cc: borntraeger@de.ibm.com
Cc: catalin.marinas@arm.com
Cc: cmetcalf@mellanox.com
Cc: cohuck@redhat.com
Cc: davem@davemloft.net
Cc: deller@gmx.de
Cc: devel@driverdev.osuosl.org
Cc: gerald.schaefer@de.ibm.com
Cc: gregkh@linuxfoundation.org
Cc: heiko.carstens@de.ibm.com
Cc: hoeppner@linux.vnet.ibm.com
Cc: hpa@zytor.com
Cc: jejb@parisc-linux.org
Cc: jwi@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: mark.rutland@arm.com
Cc: mingo@redhat.com
Cc: mpe@ellerman.id.au
Cc: oberpar@linux.vnet.ibm.com
Cc: oprofile-list@lists.sf.net
Cc: paulus@samba.org
Cc: peterz@infradead.org
Cc: ralf@linux-mips.org
Cc: rostedt@goodmis.org
Cc: rric@kernel.org
Cc: schwidefsky@de.ibm.com
Cc: sebott@linux.vnet.ibm.com
Cc: sparclinux@vger.kernel.org
Cc: sth@linux.vnet.ibm.com
Cc: ubraun@linux.vnet.ibm.com
Cc: will.deacon@arm.com
Cc: x86@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: James Hogan <jhogan@kernel.org>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-19 13:29:54 +02:00
Linus Torvalds
b1cb4f93b5 powerpc fixes for 4.17 #2
- Fix crashes when loading modules built with a different CONFIG_RELOCATABLE
    value by adding CONFIG_RELOCATABLE to vermagic.
 
  - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions
    from firmware.
 
  - Remove tlbie trace points from KVM code that's called in real mode, because
    it causes crashes.
 
  - Fix checkstops caused by invalid tlbiel on Power9 Radix.
 
  - Ensure the set of CPU features we "know" are always enabled is actually the
    minimal set when we build with support for firmware supplied CPU features.
 
 Thanks to:
   Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJa0oWBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAV
 EA//UQB7n33HlroMBL3841VswUrIgY36gSi9+QZPjxHiTYGVStI+u0FHq5hm8OMm
 1FkronD0sfbN7t8LJ9FS6vCoAlW15vMBj95pWJXAL7NuQneO7cM5JvRbowVKbNbq
 GESn4EkUj9bAXl7aTzX2yA7jruzMJIwE/2bgl1J6YkpByHx5x/fgzKTuoCRggQqY
 Xd656ZZyWR/uIuWhAjCPFdrPnekVvo7/Gy5Yo5kcR6LbX4MO0JFNOHpiT3wPGGv/
 LJedJtdpH1biMk7SrqLfWD7mpMsVJeMelpkftEbe8zUXf17wBd1rl4JkybLTDwZD
 HznZ0nbnh0j5Q/KRHwOh0sLDSisJF6pQHH/Bnc4Xqrsn4OERwEk40/Ym/O0kDsjH
 n2F3X5a5QLWoBWRsdV3BMyEzc0bgnb++tXeBWOV4jJBEOhoqxwimCU+3fmLiy95A
 urJYf8Sljwve9I5kJyXKpruopgeoJ1aumRwopE8YCG9lfBgzvU/90u6+uKqAdkOv
 kpZxS5FDT2lNIwbCP9WjEelAOGrtA6JQNWU0iPGwsVAvAxX/p1RwwbQeWVlWs3Ek
 UdVF8eGezClpLPa/FQFdDyXfGE6WIf1vK9YnG7k3rUwwkSqoYxKgVQ3o1Vetp0Lg
 fzPsDdDi2V2I3KDYNav8s6kohAIIS/a9XYk4BGvT/htRnUg=
 =0e7c
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes when loading modules built with a different
   CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.

 - Fix busy loops in the OPAL NVRAM driver if we get certain error
   conditions from firmware.

 - Remove tlbie trace points from KVM code that's called in real mode,
   because it causes crashes.

 - Fix checkstops caused by invalid tlbiel on Power9 Radix.

 - Ensure the set of CPU features we "know" are always enabled is
   actually the minimal set when we build with support for firmware
   supplied CPU features.

Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.

* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
  powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
  KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
  powerpc/8xx: Fix build with hugetlbfs enabled
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/fscr: Enable interrupts earlier before calling get_user()
  powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
  powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-15 11:57:12 -07:00
AKASHI Takahiro
9ec4ecef0a kexec_file,x86,powerpc: factor out kexec_file_ops functions
As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array.  So let's factor them out.

Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Michael Ellerman
81b654c273 powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
The cpu_has_feature() mechanism has an optimisation where at build
time we construct a mask of the CPU feature bits that will always be
true for the given .config, based on the platform/bitness/etc. that we
are building for.

That is incompatible with DT CPU features, where the set of CPU
features is dependent on feature flags that are given to us by
firmware.

The result is that some feature bits can not be *disabled* by DT CPU
features. Or more accurately, they can be disabled but they will still
appear in the ALWAYS mask, meaning cpu_has_feature() will always
return true for them.

In the past this hasn't really been a problem because on Book3S
64 (where we support DT CPU features), the set of ALWAYS bits has been
very small. That was because we always built for POWER4 and later,
meaning the set of common bits was small.

The only bit that could be cleared by DT CPU features that was also in
the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
the alignment handler to create a fake DSISR. That code was itself
deleted in 31bfdb036f ("powerpc: Use instruction emulation
infrastructure to handle alignment faults") (Sep 2017).

However the set of ALWAYS features changed with the recent commit
db5ae1c155 ("powerpc/64s: Refine feature sets for little endian
builds") which restricted the set of feature flags when building
little endian to Power7 or later. That caused the ALWAYS mask to
become much larger for little endian builds.

The result is that the following feature bits can currently not
be *disabled* by DT CPU features:

  CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
  CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
  CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.

To fix it we need to mask the set of ALWAYS features with the base set
of DT CPU features, ie. the features that are always enabled by DT CPU
features. That way there are no bits in the ALWAYS mask that are not
also always set by DT CPU features.

Fixes: db5ae1c155 ("powerpc/64s: Refine feature sets for little endian builds")
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-13 23:51:44 +10:00
Nicholas Piggin
34dd25de9f powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
This is the start of an effort to tidy up and standardise all the
delays. Existing loops have a range of delay/sleep periods from 1ms
to 20ms, and some have no delay. They all loop forever except rtc,
which times out after 10 retries, and that uses 10ms delays. So use
10ms as our standard delay. The OPAL maintainer agrees 10ms is a
reasonable starting point.

The idea is to use the same recipe everywhere, once this is proven to
work then it will be documented as an OPAL API standard. Then both
firmware and OS can agree, and if a particular call needs something
else, then that can be documented with reasoning.

This is not the end-all of this effort, it's just a relatively easy
change that fixes some existing high latency delays. There should be
provision for standardising timeouts and/or interruptible loops where
possible, so non-fatal firmware errors don't cause hangs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11 00:05:23 +10:00
Linus Torvalds
d8312a3f61 ARM:
- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
2018-04-09 11:42:31 -07:00
Michael Ellerman
73aca179d7 powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
If you build the kernel with CONFIG_RELOCATABLE=n, then install the
modules, rebuild the kernel with CONFIG_RELOCATABLE=y and leave the
old modules installed, we crash something like:

  Unable to handle kernel paging request for data at address 0xd000000018d66cef
  Faulting instruction address: 0xc0000000021ddd08
  Oops: Kernel access of bad area, sig: 11 [#1]
  Modules linked in: x_tables autofs4
  CPU: 2 PID: 1 Comm: systemd Not tainted 4.16.0-rc6-gcc_ubuntu_le-g99fec39 #1
  ...
  NIP check_version.isra.22+0x118/0x170
  Call Trace:
    __ksymtab_xt_unregister_table+0x58/0xfffffffffffffcb8 [x_tables] (unreliable)
    resolve_symbol+0xb4/0x150
    load_module+0x10e8/0x29a0
    SyS_finit_module+0x110/0x140
    system_call+0x58/0x6c

This happens because since commit 71810db27c ("modversions: treat
symbol CRCs as 32 bit quantities"), a relocatable kernel encodes and
handles symbol CRCs differently from a non-relocatable kernel.

Although it's possible we could try and detect this situation and
handle it, it's much more robust to simply make the state of
CONFIG_RELOCATABLE part of the module vermagic.

Fixes: 71810db27c ("modversions: treat symbol CRCs as 32 bit quantities")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-09 13:21:02 +10:00
Linus Torvalds
49a695ba72 powerpc updates for 4.17
Notable changes:
 
  - Support for 4PB user address space on 64-bit, opt-in via mmap().
 
  - Removal of POWER4 support, which was accidentally broken in 2016 and no one
    noticed, and blocked use of some modern instructions.
 
  - Workarounds so that the hypervisor can enable Transactional Memory on Power9.
 
  - A series to disable the DAWR (Data Address Watchpoint Register) on Power9.
 
  - More information displayed in the meltdown/spectre_v1/v2 sysfs files.
 
  - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome.
 
  - A big series to make the allocation of our pacas (per cpu area), kernel page
    tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9.
 
 And as usual many fixes, reworks and cleanups.
 
 Thanks to:
   Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy
   Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin
   Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens,
   Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă,
   Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier,
   Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus
   Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de
   Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
   Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool,
   Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar
   Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant
   Hegde, Wei Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJayKxDExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAr
 JQ/6A9Xs4zHDn9OeT9esEIxciETqUlrP0Wp64c4JVC7EkG1E7xRDZ4Xb4m8R2nNt
 9sPhtNO1yCtEk6kFQtPNB0N8v6pud4I6+aMcYnn+tP8mJRYQ4x9bYaF3Hw98IKmE
 Kd6TglmsUQvh2GpwPiF93KpzzWu1HB2kZzzqJcAMTMh7C79Qz00BjrTJltzXB2jx
 tJ+B4lVy8BeU8G5nDAzJEEwb5Ypkn8O40rS/lpAwVTYOBJ8Rbyq8Fj82FeREK9YO
 4EGaEKPkC/FdzX7OJV3v2/nldCd8pzV471fAoGuBUhJiJBMBoBybcTHIdDex7LlL
 zMLV1mUtGo8iolRPhL8iCH+GGifZz2WzstYCozz7hgIraWtc/frq9rZp6q0LdH/K
 trk7UbPGlVb92ecWZVpZyEcsMzKrCgZqnAe9wRNh1uEKScEdzd/bmRaMhENUObRh
 Hili6AVvmSKExpy7k2sZP/oUMaeC15/xz8Lk7l8a/iCkYhNmPYh5iSXM5+UKpcRT
 FYOcO0o3DwXsN46Whow3nJ7TqAsDy9/ecPUG71JQi3ZrHnRrm8jxkn8MCG5pZ1Fi
 KvKDxlg6RiJo3DF9/fSOpJUokvMwqBS5dJo4eh5eiDy94aBTqmBKFecvPxQm7a0L
 l3uXCF/6JuXEvMukFjGBO4RiYhw8i+B2uKsh81XUh7HKrgE=
 =HAB1
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for 4PB user address space on 64-bit, opt-in via mmap().

   - Removal of POWER4 support, which was accidentally broken in 2016
     and no one noticed, and blocked use of some modern instructions.

   - Workarounds so that the hypervisor can enable Transactional Memory
     on Power9.

   - A series to disable the DAWR (Data Address Watchpoint Register) on
     Power9.

   - More information displayed in the meltdown/spectre_v1/v2 sysfs
     files.

   - A vpermxor (Power8 Altivec) implementation for the raid6 Q
     Syndrome.

   - A big series to make the allocation of our pacas (per cpu area),
     kernel page tables, and per-cpu stacks NUMA aware when using the
     Radix MMU on Power9.

  And as usual many fixes, reworks and cleanups.

  Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
  Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
  Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
  Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
  Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
  Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
  Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
  Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
  Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
  Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
  Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
  Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
  Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"

* tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
  powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
  powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
  powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
  powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
  Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
  powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
  powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
  cxl: Fix possible deadlock when processing page faults from cxllib
  powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
  powerpc/mm/radix: Update command line parsing for disable_radix
  powerpc/mm/radix: Parse disable_radix commandline correctly.
  powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
  powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
  powerpc/mm/keys: Update documentation and remove unnecessary check
  powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
  powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
  powerpc/powernv: Always stop secondaries before reboot/shutdown
  powerpc: hard disable irqs in smp_send_stop loop
  powerpc: use NMI IPI for smp_send_stop
  powerpc/powernv: Fix SMT4 forcing idle code
  ...
2018-04-07 12:08:19 -07:00
Linus Torvalds
3c0d551e02 pci-v4.17-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7
 E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf
 +yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4
 /jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA
 XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/
 qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr
 0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD
 gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX
 AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR
 Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME
 f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv
 uLEf0DU7AEmdvzQ=
 =T++R
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - move pci_uevent_ers() out of pci.h (Michael Ellerman)

 - skip ASPM common clock warning if BIOS already configured it (Sinan
   Kaya)

 - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)

 - remove last user of pci_get_bus_and_slot() and the function itself
   (Sinan Kaya)

 - add decoding for 16 GT/s link speed (Jay Fang)

 - add interfaces to get max link speed and width (Tal Gilboa)

 - add pcie_bandwidth_capable() to compute max supported link bandwidth
   (Tal Gilboa)

 - add pcie_bandwidth_available() to compute bandwidth available to
   device (Tal Gilboa)

 - add pcie_print_link_status() to log link speed and whether it's
   limited (Tal Gilboa)

 - use PCI core interfaces to report when device performance may be
   limited by its slot instead of doing it in each driver (Tal Gilboa)

 - fix possible cpqphp NULL pointer dereference (Shawn Lin)

 - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
   hotplug (Mika Westerberg)

 - add support for PCI I/O port space that's neither directly accessible
   via CPU in/out instructions nor directly mapped into CPU physical
   memory space. This is fairly intrusive and includes minor changes to
   interfaces used for I/O space on most platforms (Zhichang Yuan, John
   Garry)

 - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
   John Garry)

 - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)

 - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
   (Shawn Lin)

 - report quirk timings with dev_info (Bjorn Helgaas)

 - report quirks that take longer than 10ms (Bjorn Helgaas)

 - add and use Altera Vendor ID (Johannes Thumshirn)

 - tidy Makefiles and comments (Bjorn Helgaas)

 - don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
   ia64, and mn10300 with x86 (Bjorn Helgaas)

 - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
   Lawler)

 - merge pcieport_if.h into portdrv.h (Bjorn Helgaas)

 - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
   Helgaas)

 - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)

 - remove portdrv link order dependency (Bjorn Helgaas)

 - remove support for unused VC portdrv service (Bjorn Helgaas)

 - simplify portdrv feature permission checking (Bjorn Helgaas)

 - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
   Helgaas)

 - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)

 - use cached AER capability offset (Frederick Lawler)

 - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)

 - rename pcie-dpc.c to dpc.c (Bjorn Helgaas)

 - use generic pci_mmap_resource_range() instead of powerpc and xtensa
   arch-specific versions (David Woodhouse)

 - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)

 - remove System and Video ROM reservations on sparc (Bjorn Helgaas)

 - probe for device reset support during enumeration instead of runtime
   (Bjorn Helgaas)

 - add ACS quirk for Ampere (née APM) root ports (Feng Kan)

 - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
   Vincent-Cross)

 - protect device restore with device lock (Sinan Kaya)

 - handle failure of FLR gracefully (Sinan Kaya)

 - handle CRS (config retry status) after device resets (Sinan Kaya)

 - skip various config reads for SR-IOV VFs as an optimization
   (KarimAllah Ahmed)

 - consolidate VPD code in vpd.c (Bjorn Helgaas)

 - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)

 - add DT support for R-Car r8a7743 (Biju Das)

 - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
   bridge driver that causes a general protection fault (Dexuan Cui)

 - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
   (Dexuan Cui)

 - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
   (Dexuan Cui)

 - make several structures static (Fengguang Wu)

 - increase number of MSI IRQs supported by Synopsys DesignWare bridges
   from 32 to 256 (Gustavo Pimentel)

 - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
   API from DesignWare drivers (Gustavo Pimentel)

 - add Tegra power management support (Manikanta Maddireddy)

 - add Tegra loadable module support (Manikanta Maddireddy)

 - handle 64-bit BARs correctly in endpoint support (Niklas Cassel)

 - support optional regulator for HiSilicon STB (Shawn Guo)

 - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)

 - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)

* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
  MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
  HISI LPC: Add ACPI support
  ACPI / scan: Do not enumerate Indirect IO host children
  ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
  HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
  of: Add missing I/O range exception for indirect-IO devices
  PCI: Apply the new generic I/O management on PCI IO hosts
  PCI: Add fwnode handler as input param of pci_register_io_range()
  PCI: Remove __weak tag from pci_register_io_range()
  MAINTAINERS: Add missing /drivers/pci/cadence directory entry
  fm10k: Report PCIe link properties with pcie_print_link_status()
  net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
  net/mlx5: Report PCIe link properties with pcie_print_link_status()
  net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
  PCI: Add pcie_print_link_status() to log link speed and whether it's limited
  PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
  misc: pci_endpoint_test: Handle 64-bit BARs properly
  PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
  PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
  PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
  ...
2018-04-06 18:31:06 -07:00
Dan Williams
09135cc594 mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize()
Patch series "mm, smaps: MMUPageSize for device-dax", v3.

Similar to commit 31383c6865 ("mm, hugetlbfs: introduce ->split() to
vm_operations_struct") here is another occasion where we want
special-case hugetlbfs/hstate enabling to also apply to device-dax.

This prompts the question what other hstate conversions we might do
beyond ->split() and ->pagesize(), but this appears to be the last of
the usages of hstate_vma() in generic/non-hugetlbfs specific code paths.

This patch (of 3):

The current powerpc definition of vma_mmu_pagesize() open codes looking
up the page size via hstate.  It is identical to the generic
vma_kernel_pagesize() implementation.

Now, vma_kernel_pagesize() is growing support for determining the page
size of Device-DAX vmas in addition to the existing Hugetlbfs page size
determination.

Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it
would automatically benefit from any new vma-type support that is added
to vma_kernel_pagesize().  However, the powerpc vma_mmu_pagesize() is
prevented from calling vma_kernel_pagesize() due to a circular header
dependency that requires vma_mmu_pagesize() to be defined before
including <linux/hugetlb.h>.

Break this circular dependency by defining the default vma_mmu_pagesize()
as a __weak symbol to be overridden by the powerpc version.

Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:26 -07:00
Nicholas Piggin
3a52f6014d powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and
above (which is what the dt_cpu_ftrs setup does). Fix cputable for
DD2.2 to match.

This came about due to patches b5af4f2793 ("powerpc: Add CPU feature
bits for TM bug workarounds on POWER9 v2.2"), and 9e9626ed3a
("powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features") being
in-flight at once. The latter patch fixed dt_cpu_ftrs like this one
does. The former changed cputable to match dt_cpu_ftrs.

Fixes: b5af4f2793 ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05 16:28:13 +10:00
Logan Gunthorpe
ef237039c5 powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
Subsequent patches in this series makes use of the readq and writeq
defines in iomap.h. However, as is, they get missed on the powerpc
platform seeing the include comes before the define. This patch moves
the include down to fix this.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05 14:58:52 +10:00
Naveen N. Rao
5d6a03ebc8 powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
We get the below warning if we try to use kexec on P9:
   kexec_core: Starting new kernel
   WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140
   [snip]
   NIP __set_breakpoint+0xb4/0x140
   LR  kexec_prepare_cpus_wait+0x58/0x150
   Call Trace:
     0xc0000000ee70fb20 (unreliable)
     0xc0000000ee70fb20
     default_machine_kexec+0x234/0x2c0
     machine_kexec+0x84/0x90
     kernel_kexec+0xd8/0xe0
     SyS_reboot+0x214/0x2c0
     system_call+0x58/0x6c

This happens since we are trying to clear hw breakpoint on POWER9,
though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint()
within hw_breakpoint_disable() with ppc_breakpoint_available() to
address this.

Fixes: 9654153158 ("powerpc: Disable DAWR in the base POWER9 CPU features")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04 21:54:02 +10:00
Aneesh Kumar K.V
fb4e5dbd44 powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
With split PTL (page table lock) config, we allocate the level
4 (leaf) page table using pte fragment framework instead of slab cache
like other levels. This was done to enable us to have split page table
lock at the level 4 of the page table. We use page->plt backing the
all the level 4 pte fragment for the lock.

Currently with Radix, we use only 16 fragments out of the allocated
page. In radix each fragment is 256 bytes which means we use only 4k
out of the allocated 64K page wasting 60k of the allocated memory.
This was done earlier to keep it closer to hash.

This patch update the pte fragment count to 256, thereby using the
full 64K page and reducing the memory usage. Performance tests shows
really low impact even with THP disabled. With THP disabled we will be
contenting further less on level 4 ptl and hence the impact should be
further low.

  256 threads:
    without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
      median = 15678.5
      stdev = 42.1209

    with patch:
      median = 15354
      stdev = 194.743

This is with THP disabled. With THP enabled the impact of the patch
will be less.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04 16:58:06 +10:00
Linus Torvalds
4608f06453 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller:

 1) Add support for ADI (Application Data Integrity) found in more
    recent sparc64 cpus. Essentially this is keyed based access to
    virtual memory, and if the key encoded in the virual address is
    wrong you get a trap.

    The mm changes were reviewed by Andrew Morton and others.

    Work by Khalid Aziz.

 2) Validate DAX completion index range properly, from Rob Gardner.

 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
  sparc64: Make atomic_xchg() an inline function rather than a macro.
  sparc64: Properly range check DAX completion index
  sparc: Make auxiliary vectors for ADI available on 32-bit as well
  sparc64: Oracle DAX driver depends on SPARC64
  sparc64: Update signal delivery to use new helper functions
  sparc64: Add support for ADI (Application Data Integrity)
  mm: Allow arch code to override copy_highpage()
  mm: Clear arch specific VM flags on protection change
  mm: Add address parameter to arch_validate_prot()
  sparc64: Add auxiliary vectors to report platform ADI properties
  sparc64: Add handler for "Memory Corruption Detected" trap
  sparc64: Add HV fault type handlers for ADI related faults
  sparc64: Add support for ADI register fields, ASIs and traps
  mm, swap: Add infrastructure for saving page metadata on swap
  signals, sparc: Add signal codes for ADI violations
2018-04-03 14:08:58 -07:00
Nicholas Piggin
f2748bdfe1 powerpc/powernv: Always stop secondaries before reboot/shutdown
Currently powernv reboot and shutdown requests just leave secondaries
to do their own things. This is undesirable because they can trigger
any number of watchdogs while waiting for reboot, but also we don't
know what else they might be doing -- they might be causing trouble,
trampling memory, etc.

The opal scheduled flash update code already ran into watchdog problems
due to flashing taking a long time, and it was fixed with 2196c6f1ed
("powerpc/powernv: Return secondary CPUs to firmware before FW update"),
which returns secondaries to opal. It's been found that regular reboots
can take over 10 seconds, which can result in the hard lockup watchdog
firing,

  reboot: Restarting system
  [  360.038896709,5] OPAL: Reboot request...
  Watchdog CPU:0 Hard LOCKUP
  Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
  Watchdog CPU:16 Hard LOCKUP
  watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]

This patch removes the special case for flash update, and calls
smp_send_stop in all cases before calling reboot/shutdown.

smp_send_stop could return CPUs to OPAL, the main reason not to is
that the request could come from a NMI that interrupts OPAL code,
so re-entry to OPAL can cause a number of problems. Putting
secondaries into simple spin loops improves the chances of a
successful reboot.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by:  Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03 22:59:57 +10:00
Mauricio Faria de Oliveira
e7347a8683 powerpc: Move default security feature flags
This moves the definition of the default security feature flags
(i.e., enabled by default) closer to the security feature flags.

This can be used to restore current flags to the default flags.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03 21:50:08 +10:00
Aneesh Kumar K.V
a6201da34f powerpc: Fix oops due to bad access of lppaca on bare metal
Commit 8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are
not virtualized") removed allocation of lppaca on bare metal
platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the
lppaca on bare metal in some code paths.

Fix this but adding runtime checks for SPLPAR (shared processor LPAR).

Fixes: 8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03 21:50:07 +10:00
Linus Torvalds
2fcd2b306a Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 dma mapping updates from Ingo Molnar:
 "This tree, by Christoph Hellwig, switches over the x86 architecture to
  the generic dma-direct and swiotlb code, and also unifies more of the
  dma-direct code between architectures. The now unused x86-only
  primitives are removed"

* 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs
  swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS
  dma/swiotlb: Remove swiotlb_{alloc,free}_coherent()
  dma/direct: Handle force decryption for DMA coherent buffers in common code
  dma/direct: Handle the memory encryption bit in common code
  dma/swiotlb: Remove swiotlb_set_mem_attributes()
  set_memory.h: Provide set_memory_{en,de}crypted() stubs
  x86/dma: Remove dma_alloc_coherent_gfp_flags()
  iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()
  iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
  x86/dma/amd_gart: Use dma_direct_{alloc,free}()
  x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA
  x86/dma: Use generic swiotlb_ops
  x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y)
  x86/dma: Remove dma_alloc_coherent_mask()
2018-04-02 17:18:45 -07:00
Nicholas Piggin
db5ae1c155 powerpc/64s: Refine feature sets for little endian builds
This reduces vmlinux text size by 1kB and data by 1.5kB with a small
build!

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add the recently added CPU_FTRS_POWER9_DD2_2 to the little
      endian possible mask as noticed by Nick.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 22:14:40 +10:00
Nicholas Piggin
471d7ff8b5 powerpc/64s: Remove POWER4 support
POWER4 has been broken since at least the change 49d09bf2a6
("powerpc/64s: Optimise MSR handling in exception handling"), which
requires mtmsrd L=1 support. This was introduced in ISA v2.01, and
POWER4 supports ISA v2.00.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:50 +11:00
Nicholas Piggin
3735eb850e powerpc: Remove unused CPU_FTR_ARCH_201
The last usage was removed in c17b98cf60 ("KVM: PPC: Book3S HV:
Remove code for PPC970 processors") (Dec 2014).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:50 +11:00
Nicholas Piggin
15a3204d24 powerpc/64s: Set assembler machine type to POWER4
Rather than override the machine type in .S code (which can hide wrong
or ambiguous code generation for the target), set the type to power4
for all assembly.

This also means we need to be careful not to build power4-only code
when we're not building for Book3S, such as the "power7" versions of
copyuser/page/memcpy.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:49 +11:00
Nicholas Piggin
d50614fa45 powerpc/64s: Explicitly add vector features to CPU_FTRS_POSSIBLE
ALTIVEC and VSX features are not added by to default to the POWERx CPU
feature sets because they are intended to be enabled by firmware.
Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in
other the set for other CPUs, eg. PPC970.

But they should be added individually to the CPU_FTRS_POSSIBLE set,
because if we reduce the set of CPUs that are built-for they may
disappear from the possible mask.

It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features
should be used because they won't be present if compiled out.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add detail to change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:48 +11:00
Nicholas Piggin
b842bd0f7a powerpc/64s: Add all POWER9 features to CPU_FTRS_ALWAYS
It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a
missed opportunity for optimisation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:48 +11:00
Nicholas Piggin
3d4fbffdd7 powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug
Implement a new function to invoke stop, power9_offline_stop, which is
like power9_idle_stop but used by the cpu hotplug code.

Move KVM secondary state manipulation code to the offline case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:46 +11:00
Michael Ellerman
f437c51748 Merge branch 'topic/paca' into next
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.

This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
2018-03-31 09:09:36 +11:00
Aneesh Kumar K.V
872a100a49 powerpc/mm/hash: Don't memset pgd table if not needed
We need to zero-out pgd table only if we share the slab cache with
pud/pmd level caches. With the support of 4PB, we don't share the slab
cache anymore. Instead of removing the code completely hide it within
an #ifdef. We don't need to do this with any other page table level,
because they all allocate table of double the size and we take of
initializing the first half corrrectly during page table zap.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Consolidate multiple #if / #ifdef into one]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:39 +11:00
Aneesh Kumar K.V
c2b4d8b741 powerpc/mm/hash64: Increase the VA range
This patch increases the max virtual (effective) address value to 4PB.
With 4K page size config we continue to limit ourself to 64TB.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Keep the H_PGTABLE_RANGE test, update it to work]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V
f384796c40 powerpc/mm: Add support for handling > 512TB address in SLB miss
For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.

The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
      in get_ea_context().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V
1a2f778970 powerpc/mm/keys: Move pte bits to correct headers
Memory keys are supported only with hash translation mode. Instead of
using #ifdef in generic code move the key related pte bits to
respective headers

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:36 +11:00
Nicholas Piggin
0bfdf59890 powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
asm/barrier.h is not always included after asm/synch.h, which meant
it was missing __SUBARCH_HAS_LWSYNC, so in some files smp_wmb() would
be eieio when it should be lwsync. kernel/time/hrtimer.c is one case.

__SUBARCH_HAS_LWSYNC is only used in one place, so just fold it in
to where it's used. Previously with my small simulator config, 377
instances of eieio in the tree. After this patch there are 55.

Fixes: 46d075be58 ("powerpc: Optimise smp_wmb")
Cc: stable@vger.kernel.org # v2.6.29+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:34 +11:00
Nicholas Piggin
29ab6c4708 powerpc/mm: Pass node id into create_section_mapping
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:07:10 +11:00
Nicholas Piggin
59f577743d powerpc/64: Defer paca allocation until memory topology is discovered
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the dummy allocate_pacas() to fix 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:28 +11:00
Nicholas Piggin
9f593f131e powerpc/setup: Add cpu_to_phys_id array
Build an array that finds hardware CPU number from logical CPU
number in firmware CPU discovery. Use that rather than setting
paca of other CPUs directly, to begin with. Subsequent patch will
not have pacas allocated at this point.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build by adding #ifdef in arch_match_cpu_phys_id()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:27 +11:00
Nicholas Piggin
9bd9be006c powerpc/mm/numa: move numa topology discovery earlier
Split sparsemem initialisation from basic numa topology discovery.
Move the parsing earlier in boot, before pacas are allocated.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:26 +11:00
Nicholas Piggin
499dcd4137 powerpc/64s: Allocate LPPACAs individually
We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:24 +11:00
Nicholas Piggin
d2e60075a3 powerpc/64: Use array of paca pointers and allocate pacas individually
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:23 +11:00
Nicholas Piggin
8e0b634b13 powerpc/64s: Do not allocate lppaca if we are not virtualized
The "lppaca" is a structure registered with the hypervisor. This is
unnecessary when running on non-virtualised platforms. One field from
the lppaca (pmcregs_in_use) is also used by the host, so move the host
part out into the paca (lppaca field is still updated in
guest mode).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix non-pseries build with some #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:22 +11:00
Radim Krčmář
27aa896281 KVM PPC update for 4.17
- Improvements for the radix page fault handler for HV KVM on POWER9.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJavHlMAAoJEJ2a6ncsY3GfWTAIANbY6b8xvhUY6c3WM1dt6o78
 6MWJW1vCcsDYuM5yyIjYTds2xjY6vm7oWbo9thDdLIE9sWP2uKBDbdem8KxZb6JL
 t/tdzuLOYkB60BfwQL0z77UmLlHSQYF5RfJjbYVe5oXt7OU5TCe43udkHsT9QtLH
 HTpxvl7ebf2TIoex1+XqrD0eJ93tSOVFWB7Ay7WRUQu08CMEQRcHaszyMqdNfHfs
 LHoPwLgxyWf+7/zt2T4++ebfysQDFpQgsEuEBugXkaHkw6pSGi6R+BOrZwVVmcGm
 jiHq5+mdho3wL+B47Rt7UTjkpsMLyFbWR6TrhMD7y84/CislizKhEdnEYymKwH4=
 =0igD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

KVM PPC update for 4.17

- Improvements for the radix page fault handler for HV KVM on POWER9.
2018-03-29 20:20:13 +02:00
Michael Ellerman
95dff480bb Merge branch 'fixes' into next
Merge our fixes branch from the 4.16 cycle.

There were a number of important fixes merged, in particular some Power9
workarounds that we want in next for testing purposes. There's also been
some conflicting changes in the CPU features code which are best merged
and tested before going upstream.
2018-03-28 22:59:50 +11:00
Michael Ellerman
c0b346729b Merge branch 'topic/ppc-kvm' into next
Merge the DAWR series, which touches arch code and KVM code and may need
to be merged into the kvm-ppc tree.
2018-03-27 23:55:49 +11:00
Michael Neuling
9654153158 powerpc: Disable DAWR in the base POWER9 CPU features
Using the DAWR on POWER9 can cause xstops, hence we need to disable
it.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:55:33 +11:00
Michael Neuling
e8ebedbf31 KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9
POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should
use the DAWR but since it's disabled there we can't.

This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR
is disabled.

Current Linux guests ignore this error, so they will silently not get
the DAWR (sigh). The same error code is being used by POWERVM in this
case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:55:32 +11:00
Michael Neuling
404b27d66e powerpc: Add ppc_breakpoint_available()
Add ppc_breakpoint_available() to determine if a breakpoint is
available currently via the DAWR or DABR.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:52:43 +11:00
Sam Bobroff
34a286a4ac powerpc/eeh: Add eeh_state_active() helper
Checking for a "fully active" device state requires testing two flag
bits, which is open coded in several places, so add a function to do
it.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:45:19 +11:00
Sam Bobroff
37fd812587 powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event()
Currently the EEH_PE_RECOVERING flag for a PE is managed by both the
caller and callee of eeh_handle_normal_event() (among other places not
considered here). This is complicated by the fact that the PE may
or may not have been invalidated by the call.

So move the callee's handling into eeh_handle_normal_event(), which
clarifies it and allows the return type to be changed to void (because
it no longer needs to indicate at the PE has been invalidated).

This should not change behaviour except in eeh_event_handler() where
it was previously possible to cause eeh_pe_state_clear() to be called
on an invalid PE, which is now avoided.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:58 +11:00
Sam Bobroff
6870178071 powerpc/eeh: Remove eeh_handle_event()
The function eeh_handle_event(pe) does nothing other than switching
between calling eeh_handle_normal_event(pe) and
eeh_handle_special_event(). However it is only called in two places,
one where pe can't be NULL and the other where it must be NULL (see
eeh_event_handler()) so it does nothing but obscure the flow of
control.

So, remove it.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:57 +11:00
Michael Ellerman
ff348355e9 powerpc/64s: Enhance the information in cpu_show_meltdown()
Now that we have the security feature flags we can make the
information displayed in the "meltdown" file more informative.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:53 +11:00
Michael Ellerman
9a868f6343 powerpc: Add security feature flags for Spectre/Meltdown
This commit adds security feature flags to reflect the settings we
receive from firmware regarding Spectre/Meltdown mitigations.

The feature names reflect the names we are given by firmware on bare
metal machines. See the hostboot source for details.

Arguably these could be firmware features, but that then requires them
to be read early in boot so they're available prior to asm feature
patching, but we don't actually want to use them for patching. We may
also want to dynamically update them in future, which would be
incompatible with the way firmware features work (at the moment at
least). So for now just make them separate flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:51 +11:00
Michael Ellerman
c4bc36628d powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
Add some additional values which have been defined for the
H_GET_CPU_CHARACTERISTICS hypercall.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 23:44:51 +11:00
Michael Ellerman
abf110f3e1 powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.

To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away (i.e., boot-time only).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 19:25:12 +11:00
Madhavan Srinivasan
b58064da04 powerpc/perf: Infrastructure to support addition of blacklisted events
Introduce code to support addition of blacklisted events for a
processor version. Blacklisted events are events that are known to not
count correctly on that CPU revision, and so should be prevented from
being counted so as to avoid user confusion.

A 'pointer' and 'int' variable to hold the number of events are added
to 'struct power_pmu', along with a generic function to loop through
the list to validate the given event. Generic function
'is_event_blacklisted' is called in power_pmu_event_init() to detect
and reject early.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27 19:25:10 +11:00
Michael Ellerman
a26cf1c9fe Merge branch 'topic/ppc-kvm' into next
This brings in two series from Paul, one of which touches KVM code and
may need to be merged into the kvm-ppc tree to resolve conflicts.
2018-03-24 08:43:18 +11:00
Paul Mackerras
4bb3c7a020 KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode).  Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads.  The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems.  This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional.  The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated.  The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present.  The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state.  Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths.  If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state.  This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active.  If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0.  The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:39:13 +11:00
Paul Mackerras
7672691a08 powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily.  This workaround is only needed when
running bare-metal.

This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state.  Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.

To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state.  If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0.  The pnv_power9_force_smt4_catch() function does the following:

1. Set the dont_stop flag for each thread in the core, except
   ourselves (in fact we use an atomic_inc() in case more than
   one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
   requested_psscr field in the paca being 0.  If this is at
   least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
   being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
   we sent a doorbell interrupt and check if they are awake now.

This relies on the following properties:

- Once dont_stop is non-zero, requested_psccr can't go from zero to
  non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
  a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
  and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
  must be in SMT4 mode, since SMT modes are powers of 2.

This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop.  The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.

Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section.  This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.

In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:39:11 +11:00
Paul Mackerras
b5af4f2793 powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
processors which will be used to enable the hypervisor to assist
hardware with the handling of checkpointed register values while the
CPU is in suspend state, in order to work around hardware bugs.  The
hardware assistance for these workarounds introduced a new hardware
bug relating to the XER[SO] bit.  We add a separate feature bit for
this bug in case future chips fix it while still requiring the
hypervisor assistance with suspend state.

When the dt_cpu_ftrs subsystem is in use, the software assistance can
be enabled using a "tm-suspend-hypervisor-assist" node in the device
tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for
the XER[SO] bug.  In the absence of such nodes, a quirk enables both
for POWER9 "Nimbus" DD2.2 processors.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:39:09 +11:00
Paul Mackerras
9bbf0b576d powerpc: Free up CPU feature bits on 64-bit machines
This moves all the CPU feature bits that are only used on 32-bit
machines to the top 20 bits of the CPU feature word and arranges
for them to be defined only in 32-bit builds.  The features that
are common to 32-bit and 64-bit machines are moved to bits 0-11
of the CPU feature word.  This means that for 64-bit platforms,
bits 44-63 can now be used for new features that only exist on
64-bit machines.  (These bit numbers are counting from the right,
i.e. the LSB is bit 0.)

Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high
16 bits, we have to adjust some assembly code.  Also, CPU_FTR_EMB_HV
moved from the high 16 bits to the low 16 bits.

Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only
64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is
a CPU execution mode as opposed to being a page attribute.

With this we now have 20 free CPU feature bits on 64-bit machines.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:38:51 +11:00
Paul Mackerras
dd0efb3f11 powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the
bit.

The last usage was removed in 86d63363de ("powerpc/e500mc: Remove
dead L2 flushing code in idle_e500.S") (Jun 2015).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:38:00 +11:00
Paul Mackerras
c0d64cf9fe powerpc: Use feature bit for RTC presence rather than timebase presence
All PowerPC CPUs other than the original PPC601 have a timebase
register rather than the "real-time clock" (RTC) register that the
PPC601 (and the original POWER and POWER2 CPUs) had.  Currently
we have a CPU feature bit to indicate the presence of the timebase,
but it makes more sense to use a bit to indicate the unusual
situation rather than the common situation.  This therefore defines
a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and
arranges for it to be set on PPC601 systems.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:36:45 +11:00
Aneesh Kumar K.V
a5d4b5891c powerpc/mm: Fixup tlbie vs store ordering issue on POWER9
On POWER9, under some circumstances, a broadcast TLB invalidation
might complete before all previous stores have drained, potentially
allowing stale stores from becoming visible after the invalidation.
This works around it by doubling up those TLB invalidations which was
verified by HW to be sufficient to close the risk window.

This will be documented in a yet-to-be-published errata.

Fixes: 1a472c9dba ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Enable the feature in the DT CPU features code for all Power9,
      rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-23 20:48:03 +11:00
Aneesh Kumar K.V
99491e2d0e powerpc/mm/radix: Remove unused code
These function are not used in the code. Remove them.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-23 16:17:39 +11:00
Benjamin Herrenschmidt
aff6f8cb3e powerpc/mm: Add tracking of the number of coprocessors using a context
Currently, when using coprocessors (which use the Nest MMU), we
simply increment the active_cpu count to force all TLB invalidations
to be come broadcast.

Unfortunately, due to an errata in POWER9, we will need to know
more specifically that coprocessors are in use.

This maintains a separate copros counter in the MMU context for
that purpose.

NB. The commit mentioned in the fixes tag below is not at fault for
the bug we're fixing in this commit and the next, but this fix applies
on top the infrastructure it introduced.

Fixes: 03b8abedf4 ("cxl: Enable global TLBIs for cxl contexts")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-23 14:14:31 +11:00
Christoph Hellwig
b6e05477c1 dma/direct: Handle the memory encryption bit in common code
Give the basic phys_to_dma() and dma_to_phys() helpers a __-prefix and add
the memory encryption mask to the non-prefixed versions.  Use the
__-prefixed versions directly instead of clearing the mask again in
various places.

Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20180319103826.12853-13-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20 10:01:59 +01:00
Matt Brown
31513207ce powerpc: Remove unused flush_dcache_phys_range()
The flush_dcache_phys_range() function is no longer used in the
kernel. The last usage was removed in c40785ad30 ("powerpc/dart: Use
a cachable DART").

This patch removes the function and declaration.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Munge change log, include commit that removed last user]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-20 16:47:53 +11:00
Matt Brown
751ba79cc5 lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
This patch uses the vpermxor instruction to optimise the raid6 Q
syndrome. This instruction was made available with POWER8, ISA version
2.07. It allows for both vperm and vxor instructions to be done in a
single instruction. This has been tested for correctness on a ppc64le
vm with a basic RAID6 setup containing 5 drives.

The performance benchmarks are from the raid6test in the
/lib/raid6/test directory. These results are from an IBM Firestone
machine with ppc64le architecture. The benchmark results show a 35%
speed increase over the best existing algorithm for powerpc (altivec).
The raid6test has also been run on a big-endian ppc64 vm to ensure it
also works for big-endian architectures.

Performance benchmarks:
  raid6: altivecx4 gen() 18773 MB/s
  raid6: altivecx8 gen() 19438 MB/s

  raid6: vpermxor4 gen() 25112 MB/s
  raid6: vpermxor8 gen() 26279 MB/s

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Add VPERMXOR macro so we can build with old binutils]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-20 16:47:25 +11:00
Paul Mackerras
39c983ea0f KVM: PPC: Remove unused kvm_unmap_hva callback
Since commit fb1522e099 ("KVM: update to new mmu_notifier semantic
v2", 2017-08-31), the MMU notifier code in KVM no longer calls the
kvm_unmap_hva callback.  This removes the PPC implementations of
kvm_unmap_hva().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-03-19 10:08:29 +11:00
Khalid Aziz
9035cf9a97 mm: Add address parameter to arch_validate_prot()
A protection flag may not be valid across entire address space and
hence arch_validate_prot() might need the address a protection bit is
being set on to ensure it is a valid protection flag. For example, sparc
processors support memory corruption detection (as part of ADI feature)
flag on memory addresses mapped on to physical RAM but not on PFN mapped
pages or addresses mapped on to devices. This patch adds address to the
parameters being passed to arch_validate_prot() so protection bits can
be validated in the relevant context.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-18 07:38:47 -07:00
Nicholas Piggin
014a32b30e powerpc/mm/slice: remove radix calls to the slice code
This is a tidy up which removes radix MMU calls into the slice
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:08 +11:00
Nicholas Piggin
5709f7cfd8 powerpc/mm/slice: implement a slice mask cache
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.

On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice mask caches.

On POWER8, this increases vfork+exec+exit performance by 9.9%
and reduces time to mmap+munmap a 64kB page by 28%.

Reduces time to mmap+munmap by about 10% on 8xx.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:06 +11:00
Nicholas Piggin
1753dd1830 powerpc/mm/slice: Simplify and optimise slice context initialisation
The slice state of an mm gets zeroed then initialised upon exec.
This is the only caller of slice_set_user_psize now, so that can be
removed and instead implement a faster and simplified approach that
requires no locking or checking existing state.

This speeds up vfork+exec+exit performance on POWER8 by 3%.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:05 +11:00
Michael Ellerman
ab83dc794c powerpc/xmon: Move empty plpar_set_ciabr() into plpar_wrappers.h
Now that plpar_wrappers.h has an #ifdef PSERIES we can move the empty
version of plpar_set_ciabr() which xmon wants into there.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:04 +11:00
Michael Ellerman
7c09c1869c powerpc: Rename plapr routines to plpar
Back in 2013 we added some hypercall wrappers which misspelled
"plpar" (P-series Logical PARtition) as "plapr".

Visually they're hard to distinguish and it almost doesn't matter, but
it is confusing when grepping to miss some calls because of the typo.

They've also started spreading, so before they take over let's fix
them all to be "plpar".

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:04 +11:00
Michael Ellerman
5017e875e4 powerpc/pseries: Make plpar_wrappers.h safe to include when PSERIES=n
Currently plpar_wrappers.h is not safe to include when
CONFIG_PPC_PSERIES=n, or at least it can be depending on other config
options and so on.

Fix that by wrapping the entire content in an ifdef.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:04 +11:00
Michael Ellerman
16560e8832 powerpc/pseries: Move smp_query_cpu_stopped() etc. out of plpar_wrappers.h
smp_query_cpu_stopped() and related #defines are currently in
plpar_wrappers.h. The function actually does an RTAS call, not an
hcall, and basically has nothing to do with plpar_wrappers.h

Move it into pseries.h, where it can easily be used by the only two
callers in pseries/smp.c and pseries/hotplug-cpu.c.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 23:43:03 +11:00
Mathieu Malaterre
e82d70cf96 powerpc/32: Add missing prototypes for (early|machine)_init()
early_init() and machine_init() have no prototype, add one in
asm-prototypes.h.

Fixes the following warnings (treated as error in W=1):
  arch/powerpc/kernel/setup_32.c:68:30: error: no previous prototype for ‘early_init’
  arch/powerpc/kernel/setup_32.c:99:21: error: no previous prototype for ‘machine_init’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
[mpe: Move them to asm-prototypes.h, drop other functions]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:42 +11:00
Mathieu Malaterre
ef85dffd42 powerpc: Avoid comparison of unsigned long >= 0 in __access_ok()
Rewrite function-like macro into regular static inline function to
avoid a warning during macro expansion.

Fix warning (treated as error in W=1):
./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned expression >= 0 is always true
   (((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)))))
                                   ^

Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:41 +11:00
Mathieu Malaterre
603b892200 powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid()
Rewrite comparison since all values compared are of type `unsigned long`.

Instead of using unsigned properties and rewriting the original code as:
(originally suggested by Segher Boessenkool <segher@kernel.crashing.org>)

  #define pfn_valid(pfn) \
               (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET))

Prefer a static inline function to make code as readable as possible.

Fix a warning (treated as error in W=1):
  arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
  #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
                                  ^

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:41 +11:00
Mathieu Malaterre
bf7fb32dd5 powerpc: Add missing prototypes for ppc_select() & ppc_fadvise64_64()
Add missing prototypes for ppc_select() & ppc_fadvise64_64() to header
asm-prototypes.h. Fix the following warnings (treated as errors in W=1)

  arch/powerpc/kernel/syscalls.c:87:1: error: no previous prototype for ‘ppc_select’
  arch/powerpc/kernel/syscalls.c:119:6: error: no previous prototype for ‘ppc_fadvise64_64’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:40 +11:00
Mathieu Malaterre
b0d876da1d powerpc: Add missing prototypes for hw_breakpoint_handler() & arch_unregister_hw_breakpoint()
In commit 5aae8a5370 ("powerpc, hw_breakpoints: Implement
hw_breakpoints for 64-bit server processors") function
hw_breakpoint_handler() and arch_unregister_hw_breakpoint() were added
without function prototypes in hw_breakpoint.h header.

Fix the following warning(s) (treated as error in W=1):
  arch/powerpc/kernel/hw_breakpoint.c:106:6: error: no previous prototype for ‘arch_unregister_hw_breakpoint’
  arch/powerpc/kernel/hw_breakpoint.c:209:5: error: no previous prototype for ‘hw_breakpoint_handler’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:39 +11:00
Mathieu Malaterre
0d60619e1c powerpc: Add missing prototype for sys_debug_setcontext()
In commit 81e7009ea4 ("powerpc: merge ppc signal.c and ppc64
signal32.c") the function sys_debug_setcontext was added without a
prototype.

Fix compilation warning (treated as error in W=1):
  arch/powerpc/kernel/signal_32.c:1227:5: error: no previous prototype for ‘sys_debug_setcontext’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:38 +11:00
Mathieu Malaterre
23a6d8b963 powerpc: Add missing prototype for init_IRQ()
A function init_IRQ() was added without a prototype declared in header
irq.h. Fix the following warning (treated as error in W=1):

  arch/powerpc/kernel/irq.c:662:13: error: no previous prototype for ‘init_IRQ’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:38 +11:00
Mathieu Malaterre
f5246862f8 powerpc: Add missing prototype for arch_irq_work_raise()
In commit 4f8b50bbbe ("irq_work, ppc: Fix up arch hooks") a new
function arch_irq_work_raise() was added without a prototype in header
irq_work.h.

Fix the following warning (treated as error in W=1):
  arch/powerpc/kernel/time.c:523:6: error: no previous prototype for ‘arch_irq_work_raise’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:37 +11:00
Mathieu Malaterre
fd70d9f96d powerpc: Add missing prototype for arch_dup_task_struct()
In commit 55ccf3fe3f ("fork: move the real prepare_to_copy() users to
arch_dup_task_struct()") a new arch_dup_task_struct() was added
without a prototype declared in thread_info.h header. Fix the
following warning (treated as error in W=1):

  arch/powerpc/kernel/process.c:1609:5: error: no previous prototype for ‘arch_dup_task_struct’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:37 +11:00
Mathieu Malaterre
848092faa0 powerpc: Add missing prototype for time_init()
The function time_init did not have a prototype defined in the time.h
header. Fix the following warning (treated as error in W=1):

  arch/powerpc/kernel/time.c:1068:13: error: no previous prototype for ‘time_init’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:36 +11:00
Mathieu Malaterre
8b604faff7 powerpc: Add missing prototype for hdec_interrupt
In commit dabe859ec6 ("powerpc: Give hypervisor decrementer interrupts
their own handler") an empty body function was added, but no prototype
was declared. Fix warning (treated as error in W=1):

  arch/powerpc/kernel/time.c:629:6: error: no previous prototype for ‘hdec_interrupt’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:35 +11:00
Mathieu Malaterre
45b4d27a38 powerpc: Add missing prototype for slb_miss_bad_addr()
In commit f0f558b131 ("powerpc/mm: Preserve CFAR value on SLB miss caused
by access to bogus address"), the function slb_miss_bad_addr() was added
without a prototype. This commit adds it.

Fix a warning (treated as error in W=1):
  arch/powerpc/kernel/traps.c:1498:6: error: no previous prototype for ‘slb_miss_bad_addr’

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:35 +11:00
Mathieu Malaterre
1cdf039bf8 powerpc/kernel: Make function __giveup_fpu() static
__giveup_fpu() is never called outside process.c, so it can be static.
That also means we don't need an empty definition in switch_to.h

Signed-off-by: Mathieu Malaterre <malat@debian.org>
[mpe: Also drop the empty version, rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:35 +11:00
Mathieu Malaterre
65e13c202d powerpc/epapr: Move register keyword at the beginning of declaration
Fix warning for all register unsigned long (0,3-12) that appear during W=1
compilation:

./arch/powerpc/include/asm/epapr_hcalls.h:479:2: warning: ‘register’ is not at beginning of declaration [-Wold-style-declaration]
  unsigned long register r[\d] asm("r[\d]");

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:32 +11:00
Philippe Bergheaud
d6a90bb83b powerpc/powernv: Enable tunneled operations
P9 supports PCI tunneled operations (atomics and as_notify). This
patch adds support for tunneled operations on powernv, with a new
API, to be called by device drivers:

pnv_pci_enable_tunnel()
   Enable tunnel operations, tell driver the 16-bit ASN indication
   used by kernel.

pnv_pci_disable_tunnel()
   Disable tunnel operations.

pnv_pci_set_tunnel_bar()
   Tell kernel the Tunnel BAR Response address used by driver.
   This function uses two new OPAL calls, as the PBCQ Tunnel BAR
   register is configured by skiboot.

pnv_pci_get_as_notify_info()
   Return the ASN info of the thread to be woken up.

Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:30 +11:00
Wanpeng Li
a4429e53c9 KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATED
This patch introduces kvm_para_has_hint() to query for hints about
the configuration of the guests.  The first hint KVM_HINTS_DEDICATED,
is set if the guest has dedicated physical CPUs for each vCPU (i.e.
pinning and no over-commitment).  This allows optimizing spinlocks
and tells the guest to avoid PV TLB flush.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06 18:40:44 +01:00
Christophe Leroy
4bd13772ee powerpc/8xx: Increase number of slices to 64
On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode.

This patch increases the number of slices from 16 to 64 on the 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06 09:21:24 +11:00
Christophe Leroy
15472423ce powerpc/mm/slice: Allow up to 64 low slices
While the implementation of the "slices" address space allows
a significant amount of high slices, it limits the number of
low slices to 16 due to the use of a single u64 low_slices_psize
element in struct mm_context_t

On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode. This means we could have at least 64 slices.

In order to override this limitation, this patch switches the
handling of low_slices_psize to char array as done already for
high_slices_psize.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06 09:21:23 +11:00
Christophe Leroy
aa0ab02ba9 powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx
On the 8xx, the page size is set in the PMD entry and applies to
all pages of the page table pointed by the said PMD entry.

When an app has some regular pages allocated (e.g. see below) and tries
to mmap() a huge page at a hint address covered by the same PMD entry,
the kernel accepts the hint allthough the 8xx cannot handle different
page sizes in the same PMD entry.

10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc
10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc

mmap(0x10080000, 524288, PROT_READ|PROT_WRITE,
     MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x10080000

This results the app remaining forever in do_page_fault()/hugetlb_fault()
and when interrupting that app, we get the following warning:

[162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4
[162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W       4.14.6 #85
[162980.035744] task: c67e2c00 task.stack: c668e000
[162980.035783] NIP:  c000fe18 LR: c00e1eec CTR: c00f90c0
[162980.035830] REGS: c668fc20 TRAP: 0700   Tainted: G W        (4.14.6)
[162980.035854] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24044224 XER: 20000000
[162980.036003]
[162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000
[162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc
[162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48
[162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000
[162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4
[162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150
[162980.036861] Call Trace:
[162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable)
[162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150
[162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4
[162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8
[162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c
[162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc
[162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614
[162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244
[162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88
[162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4
[162980.037781] Instruction dump:
[162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94
[162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094
[162980.038216] ---[ end trace c0ceeca8e7a5800a ]---
[162980.038754] BUG: non-zero nr_ptes on freeing mm: 1
[162985.363322] BUG: non-zero nr_ptes on freeing mm: -1

In order to fix this, this patch uses the address space "slices"
implemented for BOOK3S/64 and enhanced to support PPC32 by the
preceding patch.

This patch modifies the context.id on the 8xx to be in the range
[1:16] instead of [0:15] in order to identify context.id == 0 as
not initialised contexts as done on BOOK3S

This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is
selected for the 8xx

Alltough we could in theory have as many slices as PMD entries, the
current slices implementation limits the number of low slices to 16.
This limitation is not preventing us to fix the initial issue allthough
it is suboptimal. It will be cured in a subsequent patch.

Fixes: 4b91428699 ("powerpc/8xx: Implement support of hugepages")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06 09:21:23 +11:00
Christophe Leroy
db3a528db4 powerpc/mm/slice: Enhance for supporting PPC32
In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used.

The high slices use bitmaps. As bitmap functions are not prepared to
handle bitmaps of size 0, this patch ensures that bitmap functions
are called only when SLICE_NUM_HIGH is not nul.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06 09:21:23 +11:00
Christophe Leroy
a3286f05bc powerpc/mm/slice: create header files dedicated to slices
In preparation for the following patch which will enhance 'slices'
for supporting PPC32 in order to fix an issue on hugepages on 8xx,
this patch takes out of page*.h all bits related to 'slices' and put
them into newly created slice.h header files.
While common parts go into asm/slice.h, subarch specific
parts go into respective books3s/64/slice.c and nohash/64/slice.c
'slices'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06 09:21:22 +11:00
David Woodhouse
28f8f1833b powerpc/pci: Use generic pci_mmap_resource_range()
Commit f719582435 ("PCI: Add pci_mmap_resource_range() and use it for
ARM64") added this generic function with the intent of using it everywhere
and ultimately killing the old arch-specific implementations.

Remove the powerpc-specific pci_mmap_page_range() and use the generic
pci_mmap_resource_range() instead.

Powerpc can mmap I/O port space, so supply the powerpc-specific
pci_iobar_pfn() required to make that work.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2018-02-28 16:18:53 -06:00
Michael Ellerman
5539d31a04 powerpc/pseries: Fix duplicate firmware feature for DRC_INFO
We had a mid-air collision between two new firmware features, DRMEM_V2
and DRC_INFO, and they ended up with the same value.

No one's actually reported any problems, presumably because the new
firmware that supports both properties is not widely available, and
the two properties tend to be enabled together.

Still if we ever had one enabled but not the other, the bugs that
could result are many and varied. So fix it.

Fixes: 3f38000eda ("powerpc/firmware: Add definitions for new drc-info firmware feature")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
2018-02-22 14:32:32 +11:00
Corentin Labbe
c1e150ceb6 powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n
When CONFIG_NUMA is not set, the build fails with:

  arch/powerpc/platforms/pseries/hotplug-cpu.c:335:4:
  error: déclaration implicite de la fonction « update_numa_cpu_lookup_table »

So we have to add update_numa_cpu_lookup_table() as an empty function
when CONFIG_NUMA is not set.

Fixes: 1d9a090783 ("powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-15 10:10:02 +11:00
Linus Torvalds
694a20dae6 powerpc fixes for 4.16 #2
A larger batch of fixes than we'd like. Roughly 1/3 fixes for new code, 1/3
 fixes for stable and 1/3 minor things.
 
 There's four commits fixing bugs when using 16GB huge pages on hash, caused by
 some of the preparatory changes for pkeys.
 
 Two fixes for bugs in the enhanced IRQ soft masking for local_t, one of which
 broke KVM in some circumstances.
 
 Four fixes for Power9. The most bizarre being a bug where futexes stopped
 working because a NULL pointer dereference didn't trap during early boot (it
 aliased the kernel mapping). A fix for memory hotplug when using the Radix MMU,
 and a fix for live migration of guests using the Radix MMU.
 
 Two fixes for hotplug on pseries machines. One where we weren't correctly
 updating NUMA info when CPUs are added and removed. And the other fixes
 crashes/hangs seen when doing memory hot remove during boot, which is apparently
 a thing people do.
 
 Finally a handful of build fixes for obscure configs and other minor fixes.
 
 Thanks to:
   Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin Ian King, Daniel
   Henrique Barboza, Florian Weimer, Guenter Roeck, Harish, Laurent Vivier,
   Madhavan Srinivasan, Mauricio Faria de Oliveira, Nathan Fontenot, Nicholas
   Piggin, Sam Bobroff.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJahDTmExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAd
 chAAtVe8hmkEJefTbU63GBeqva0JHSiTu2DENZAlN/epWtbtyl05PLETMdTcwGCv
 nK2zzR+xbSFN1DzZK8KQfDBW33McKZE+YkHwYOC8Kff/N0SKdHK4zvxYr7FTZGzG
 9uSG5vrxVEsPLT/yANabl0d0vKWMsJ1jZquvJAU0eLNUbA/skGjEPADtXqYQUXiA
 EnW4xeczsMLjuzTleoRqrBx74Gulovuq9LVAjfDvkydWlCU9MQkrodCgP0V2hQtw
 RAJ/QLY+NS/vMCBnvVOGBaKzIqrfeQTHF3P0j4pyBeBq/2kNuidM5n25uoc31wUq
 DE4Ebe2FJA6CHP5KEyf7dr9y7gsks/ak3/CKs+l6Yz3/0BqenEMhu6WKJ1tgf9cC
 qAmi1dIjtpw6JZ6baCbkloUdAGNjKVfLWB9ld9VIfg0C+C3y4L7+TKJukxrCBGI6
 hopfT/3p8xUdla3euiRXRLZzajyKDGrqk71hk5J/J0ChXfWB0B51X0F6NIfH41Mn
 YsVUQ95p3zS79Pl942ijGScFX/bNVLfEEGzlI/nwU/wbTxF5g/XNXm5PjBsGSr/W
 zFcCwCpFV2b/kypQoxQA5CbrKRCLOleDA/lLOxW/1NMYOQsNj05DM9wYAw5Bl+lX
 AVj2c5jM9heNN4scxDiufRNfqZbyjZ4fFUpXLNqs7N5vcks=
 =BmuL
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A larger batch of fixes than we'd like. Roughly 1/3 fixes for new
  code, 1/3 fixes for stable and 1/3 minor things.

  There's four commits fixing bugs when using 16GB huge pages on hash,
  caused by some of the preparatory changes for pkeys.

  Two fixes for bugs in the enhanced IRQ soft masking for local_t, one
  of which broke KVM in some circumstances.

  Four fixes for Power9. The most bizarre being a bug where futexes
  stopped working because a NULL pointer dereference didn't trap during
  early boot (it aliased the kernel mapping). A fix for memory hotplug
  when using the Radix MMU, and a fix for live migration of guests using
  the Radix MMU.

  Two fixes for hotplug on pseries machines. One where we weren't
  correctly updating NUMA info when CPUs are added and removed. And the
  other fixes crashes/hangs seen when doing memory hot remove during
  boot, which is apparently a thing people do.

  Finally a handful of build fixes for obscure configs and other minor
  fixes.

  Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin
  Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck,
  Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de
  Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff"

* tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix to use ucontext_t instead of struct ucontext
  powerpc/kdump: Fix powernv build break when KEXEC_CORE=n
  powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug
  powerpc/mm/hash64: Zero PGD pages on allocation
  powerpc/mm/hash64: Store the slot information at the right offset for hugetlb
  powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled
  powerpc/mm: Fix crashes with 16G huge pages
  powerpc/mm: Flush radix process translations when setting MMU type
  powerpc/vas: Don't set uses_vas for kernel windows
  powerpc/pseries: Enable RAS hotplug events later
  powerpc/mm/radix: Split linear mapping on hot-unplug
  powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID
  ocxl: fix signed comparison with less than zero
  powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking
  powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro
  powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove
2018-02-14 10:06:41 -08:00
Guenter Roeck
9109617545 powerpc/kdump: Fix powernv build break when KEXEC_CORE=n
If KEXEC_CORE is not enabled, powernv builds fail as follows.

  arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
  arch/powerpc/platforms/powernv/smp.c:236:4: error:
  	implicit declaration of function 'crash_ipi_callback'

Add dummy function calls, similar to kdump_in_progress(), to solve the
problem.

Fixes: 4145f35864 ("powernv/kdump: Fix cases where the kdump kernel can get HMI's")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13 22:39:37 +11:00
Guenter Roeck
82343484a2 powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug
Commit e67e02a544 ("powerpc/pseries: Fix cpu hotplug crash with
memoryless nodes") adds an unconditional call to
find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR
is enabled. This results in the following build error if this is not
the case.

  arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
  arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
  			undefined reference to `.find_and_online_cpu_nid'

Follow the guideline provided by similar functions and provide a dummy
function if CONFIG_PPC_SPLPAR is not enabled. This also moves the
external function declaration into an include file where it should be.

Fixes: e67e02a544 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[mpe: Change subject to emphasise the build fix]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13 22:38:39 +11:00
Aneesh Kumar K.V
fc5c2f4a55 powerpc/mm/hash64: Zero PGD pages on allocation
On powerpc we allocate page table pages from slab caches of different
sizes. Currently we have a constructor that zeroes out the objects when
we allocate them for the first time.

We expect the objects to be zeroed out when we free the the object
back to slab cache. This happens in the unmap path. For hugetlb pages
we call huge_pte_get_and_clear() to do that.

With the current configuration of page table size, both PUD and PGD
level tables are allocated from the same slab cache. At the PUD level,
we use the second half of the table to store the slot information. But
we never clear that when unmapping.

When such a freed object is then allocated for a PGD page, the second
half of the page table page will not be zeroed as expected. This
results in a kernel crash.

Fix it by always clearing PGD pages when they're allocated.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Change log wording and formatting, add whitespace]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13 22:37:48 +11:00
Aneesh Kumar K.V
ff31e10546 powerpc/mm/hash64: Store the slot information at the right offset for hugetlb
The hugetlb pte entries are at the PMD and PUD level, so we can't use
PTRS_PER_PTE to find the second half of the page table. Use the right
offset for PUD/PMD to get to the second half of the table.

Fixes: bf9a95f9a6 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13 22:37:48 +11:00
Aneesh Kumar K.V
4a7aa4fecb powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled
We use the second half of the page table to store slot information, so we must
allocate it always if hugetlb is possible.

Fixes: bf9a95f9a6 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13 22:37:48 +11:00
Aneesh Kumar K.V
fae2211697 powerpc/mm: Fix crashes with 16G huge pages
To support memory keys, we moved the hash pte slot information to the
second half of the page table. This was ok with PTE entries at level
4 (PTE page) and level 3 (PMD). We already allocate larger page table
pages at those levels to accomodate extra details. For level 4 we
already have the extra space which was used to track 4k hash page
table entry details and at level 3 the extra space was allocated to
track the THP details.

With hugetlbfs PTE, we used this extra space at the PMD level to store
the slot details. But we also support hugetlbfs PTE at PUD level for
16GB pages and PUD level page didn't allocate extra space. This
resulted in memory corruption.

Fix this by allocating extra space at PUD level when HUGETLB is
enabled.

Fixes: bf9a95f9a6 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-13 22:37:47 +11:00
Linus Torvalds
15303ba5d1 KVM changes for 4.16
ARM:
 - Include icache invalidation optimizations, improving VM startup time
 
 - Support for forwarded level-triggered interrupts, improving
   performance for timers and passthrough platform devices
 
 - A small fix for power-management notifiers, and some cosmetic changes
 
 PPC:
 - Add MMIO emulation for vector loads and stores
 
 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
   requiring the complex thread synchronization of older CPU versions
 
 - Improve the handling of escalation interrupts with the XIVE interrupt
   controller
 
 - Support decrement register migration
 
 - Various cleanups and bugfixes.
 
 s390:
 - Cornelia Huck passed maintainership to Janosch Frank
 
 - Exitless interrupts for emulated devices
 
 - Cleanup of cpuflag handling
 
 - kvm_stat counter improvements
 
 - VSIE improvements
 
 - mm cleanup
 
 x86:
 - Hypervisor part of SEV
 
 - UMIP, RDPID, and MSR_SMI_COUNT emulation
 
 - Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
 
 - Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
   features
 
 - Show vcpu id in its anonymous inode name
 
 - Many fixes and cleanups
 
 - Per-VCPU MSR bitmaps (already merged through x86/pti branch)
 
 - Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
 Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
 Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
 xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
 /9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
 FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
 =C/uD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "ARM:

   - icache invalidation optimizations, improving VM startup time

   - support for forwarded level-triggered interrupts, improving
     performance for timers and passthrough platform devices

   - a small fix for power-management notifiers, and some cosmetic
     changes

  PPC:

   - add MMIO emulation for vector loads and stores

   - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
     requiring the complex thread synchronization of older CPU versions

   - improve the handling of escalation interrupts with the XIVE
     interrupt controller

   - support decrement register migration

   - various cleanups and bugfixes.

  s390:

   - Cornelia Huck passed maintainership to Janosch Frank

   - exitless interrupts for emulated devices

   - cleanup of cpuflag handling

   - kvm_stat counter improvements

   - VSIE improvements

   - mm cleanup

  x86:

   - hypervisor part of SEV

   - UMIP, RDPID, and MSR_SMI_COUNT emulation

   - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit

   - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
     AVX512 features

   - show vcpu id in its anonymous inode name

   - many fixes and cleanups

   - per-VCPU MSR bitmaps (already merged through x86/pti branch)

   - stable KVM clock when nesting on Hyper-V (merged through
     x86/hyperv)"

* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
  KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
  KVM: PPC: Book3S HV: Branch inside feature section
  KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
  KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
  KVM: PPC: Book3S PR: Fix broken select due to misspelling
  KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
  KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
  KVM: PPC: Book3S HV: Drop locks before reading guest memory
  kvm: x86: remove efer_reload entry in kvm_vcpu_stat
  KVM: x86: AMD Processor Topology Information
  x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
  kvm: embed vcpu id to dentry of vcpu anon inode
  kvm: Map PFN-type memory regions as writable (if possible)
  x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
  KVM: arm/arm64: Fixup userspace irqchip static key optimization
  KVM: arm/arm64: Fix userspace_irqchip_in_use counting
  KVM: arm/arm64: Fix incorrect timer_is_pending logic
  MAINTAINERS: update KVM/s390 maintainers
  MAINTAINERS: add Halil as additional vfio-ccw maintainer
  MAINTAINERS: add David as a reviewer for KVM/s390
  ...
2018-02-10 13:16:35 -08:00
Radim Krčmář
1ab03c072f Second PPC KVM update for 4.16
Seven fixes that are either trivial or that address bugs that people
 are actually hitting.  The main ones are:
 
 - Drop spinlocks before reading guest memory
 
 - Fix a bug causing corruption of VCPU state in PR KVM with preemption
   enabled
 
 - Make HPT resizing work on POWER9
 
 - Add MMIO emulation for vector loads and stores, because guests now
   use these instructions in memcpy and similar routines.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJafWn0AAoJEJ2a6ncsY3GfaMsIANF0hQD8SS78WNKnoy0vnZ/X
 PUXdjwHEsfkg5KdQ7o0oaa2BJHHqO3vozddmMiG14r2L1mNCHJpnVZCVV0GaEJcZ
 eU8++OPK6yrsPNNpAjnrtQ0Vk4LwzoT0bftEjS3TtLt1s2uSo+R1+HLmxbxGhQUX
 bZngo9wQ3cjUfAXLrPtAVhE5wTmgVOiufVRyfRsBRdFzRsAWqjY4hBtJAfwdff4r
 AA5H0RCrXO6e1feKr5ElU8KzX6b7IjH9Xu868oJ1r16zZfE05PBl1X5n4XG7XDm7
 xWvs8uLAB7iRv2o/ecFznYJ+Dz1NCBVzD0RmAUTqPCcVKDrxixaTkqMPFW97IAA=
 =HOJR
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

Second PPC KVM update for 4.16

Seven fixes that are either trivial or that address bugs that people
are actually hitting.  The main ones are:

- Drop spinlocks before reading guest memory

- Fix a bug causing corruption of VCPU state in PR KVM with preemption
  enabled

- Make HPT resizing work on POWER9

- Add MMIO emulation for vector loads and stores, because guests now
  use these instructions in memcpy and similar routines.
2018-02-09 22:03:06 +01:00
Jose Ricardo Ziviani
09f984961c KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
This patch provides the MMIO load/store vector indexed
X-Form emulation.

Instructions implemented:
lvx: the quadword in storage addressed by the result of EA &
0xffff_ffff_ffff_fff0 is loaded into VRT.

stvx: the contents of VRS are stored into the quadword in storage
addressed by the result of EA & 0xffff_ffff_ffff_fff0.

Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09 16:51:51 +11:00
Nicholas Piggin
6cc3f91bf6 powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking
The soft IRQ masking code has to hard-disable interrupts in cases
where the exception is not cleared by the masked handler. External
interrupts used this approach for soft masking. Now recently PMU
interrupts do the same thing.

The soft IRQ masking code additionally allowed for interrupt handlers
to hard-enable interrupts after soft-disabling them. The idea is to
allow PMU interrupts through to profile interrupt handlers.

So when interrupts are being replayed when there is a pending
interrupt that requires hard-disabling, there is a test to prevent
those handlers from hard-enabling them if there is a pending external
interrupt. may_hard_irq_enable() handles this.

After f442d00480 ("powerpc/64s: Add support to mask perf interrupts
and replay them"), may_hard_irq_enable() could prematurely enable
MSR[EE] when a PMU exception exists, which would result in the
interrupt firing again while masked, and MSR[EE] being disabled again.

I haven't seen that this could cause a serious problem, but it's
more consistent to handle these soft-masked interrupts in the same
way. So introduce a define for all types of interrupts that require
MSR[EE] masking in their soft-disable handlers, and use that in
may_hard_irq_enable().

Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08 23:56:10 +11:00
Madhavan Srinivasan
5c11d1e52d powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro
Commit f14e953b19 ("powerpc/64s: Add support to take additional
parameter in MASKABLE_* macro") messed up MASKABLE_RELON_EXCEPTION_HV_OOL
macro by adding the wrong SOFTEN test which caused guest kernel crash
at boot. Patch to fix the macro to use SOFTEN_TEST_HV instead of
SOFTEN_NOTEST_HV.

Fixes: f14e953b19 ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fix-Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08 23:56:10 +11:00
Nathan Fontenot
1d9a090783 powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove
When DLPAR removing a CPU, the unmapping of the cpu from a node in
unmap_cpu_from_node() should also invalidate the CPUs entry in the
numa_cpu_lookup_table. There is not a guarantee that on a subsequent
DLPAR add of the CPU the associativity will be the same and thus
could be in a different node. Invalidating the entry in the
numa_cpu_lookup_table causes the associativity to be read from the
device tree at the time of the add.

The current behavior of not invalidating the CPUs entry in the
numa_cpu_lookup_table can result in scenarios where the the topology
layout of CPUs in the partition does not match the device tree
or the topology reported by the HMC.

This bug looks like it was introduced in 2004 in the commit titled
"ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the
linux-fullhist tree. Hence tag it for all stable releases.

Cc: stable@vger.kernel.org
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-08 23:56:10 +11:00
Ingo Molnar
8284507916 Merge branch 'linus' into sched/urgent, to resolve conflicts
Conflicts:
	arch/arm64/kernel/entry.S
	arch/x86/Kconfig
	include/linux/sched/mm.h
	kernel/fork.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06 21:12:31 +01:00
Mathieu Desnoyers
c5f58bd58f membarrier: Provide GLOBAL_EXPEDITED command
Allow expedited membarrier to be used for data shared between processes
through shared memory.

Processes wishing to receive the membarriers register with
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue
membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED.

This allows extremely simple kernel-level implementation: we have almost
everything we need with the PRIVATE_EXPEDITED barrier code. All we need
to do is to add a flag in the mm_struct that will be used to check
whether we need to send the IPI to the current thread of each CPU.

There is a slight downside to this approach compared to targeting
specific shared memory users: when performing a membarrier operation,
all registered "global" receivers will get the barrier, even if they
don't share a memory mapping with the sender issuing
MEMBARRIER_CMD_GLOBAL_EXPEDITED.

This registration approach seems to fit the requirement of not
disturbing processes that really deeply care about real-time: they
simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.

In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED"
command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of
MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward
compatibility.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05 21:34:31 +01:00
Mathieu Desnoyers
3ccfebedd8 powerpc, membarrier: Skip memory barrier in switch_mm()
Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05 21:34:02 +01:00