Commit graph

86 commits

Author SHA1 Message Date
Nicholas Piggin
cf0b0e3712 KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
The POWER9 ERAT flush instruction is a SLBIA with IH=7, which is a
reserved value on POWER7/8. On POWER8 this invalidates the SLB entries
above index 0, similarly to SLBIA IH=0.

If the SLB entries are invalidated, and then the guest is bypassed, the
host SLB does not get re-loaded, so the bolted entries above 0 will be
lost. This can result in kernel stack access causing a SLB fault.

Kernel stack access causing a SLB fault was responsible for the infamous
mega bug (search "Fix SLB reload bug"). Although since commit
48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C") that
starts using the kernel stack in the SLB miss handler, it might only
result in an infinite loop of SLB faults. In any case it's a bug.

Fix this by only executing the instruction on >= POWER9 where IH=7 is
defined not to invalidate the SLB. POWER7/8 don't require this ERAT
flush.

Fixes: 5008711259 ("KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119031627.577853-1-npiggin@gmail.com
2021-11-24 21:00:36 +11:00
Sebastian Andrzej Siewior
5ae36401ca powerpc: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210803141621.780504-4-bigeasy@linutronix.de
2021-08-10 23:14:56 +10:00
Suraj Jitindar Singh
77bbbc0cf8 KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors
The POWER9 vCPU TLB management code assumes all threads in a core share
a TLB, and that TLBIEL execued by one thread will invalidate TLBs for
all threads. This is not the case for SMT8 capable POWER9 and POWER10
(big core) processors, where the TLB is split between groups of threads.
This results in TLB multi-hits, random data corruption, etc.

Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine
which siblings share TLBs, and use that in the guest TLB flushing code.

[npiggin@gmail.com: add changelog and comment]

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com
2021-06-21 09:22:34 +10:00
Nicholas Piggin
2ce008c8b2 KVM: PPC: Book3S HV: Remove unused nested HV tests in XICS emulation
Commit f3c18e9342 ("KVM: PPC: Book3S HV: Use XICS hypercalls when
running as a nested hypervisor") added nested HV tests in XICS
hypercalls, but not all are required.

* icp_eoi is only called by kvmppc_deliver_irq_passthru which is only
  called by kvmppc_check_passthru which is only caled by
  kvmppc_read_one_intr.

* kvmppc_read_one_intr is only called by kvmppc_read_intr which is only
  called by the L0 HV rmhandlers code.

* kvmhv_rm_send_ipi is called by:
  - kvmhv_interrupt_vcore which is only called by kvmhv_commence_exit
    which is only called by the L0 HV rmhandlers code.
  - icp_send_hcore_msg which is only called by icp_rm_set_vcpu_irq.
  - icp_rm_set_vcpu_irq which is only called by icp_rm_try_update
  - icp_rm_set_vcpu_irq is not nested HV safe because it writes to
    LPCR directly without a kvmhv_on_pseries test. Nested handlers
    should not in general be using the rm handlers.

The important test seems to be in kvmppc_ipi_thread, which sends the
virt-mode H_IPI handler kick to use smp_call_function rather than
msgsnd.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-26-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
dcbac73a5b KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers
Now that the P7/8 path no longer supports radix, real-mode handlers
do not need to deal with being called in virt mode.

This change effectively reverts commit acde25726b ("KVM: PPC: Book3S
HV: Add radix checks in real-mode hypercall handlers").

It removes a few more real-mode tests in rm hcall handlers, which
allows the indirect ops for the xive module to be removed from the
built-in xics rm handlers.

kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more
descriptive and consistent with other rm handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-25-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin
732f21a305 KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR
Rather than clear the HV bit from the MSR at guest entry, make it clear
that the hypervisor does not allow the guest to set the bit.

The HV clear is kept in guest entry for now, but a future patch will
warn if it is set.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-13-npiggin@gmail.com
2021-04-12 13:36:24 +10:00
Nicholas Piggin
946cf44ac6 KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR
Rather than add the ME bit to the MSR at guest entry, make it clear
that the hypervisor does not allow the guest to clear the bit.

The ME set is kept in guest entry for now, but a future patch will
warn if it's not present.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-12-npiggin@gmail.com
2021-04-12 13:36:24 +10:00
Linus Torvalds
b12b472496 powerpc updates for 5.12
A large series adding wrappers for our interrupt handlers, so that irq/nmi/user
 tracking can be isolated in the wrappers rather than spread in each handler.
 
 Conversion of the 32-bit syscall handling into C.
 
 A series from Nick to streamline our TLB flushing when using the Radix MMU.
 
 Switch to using queued spinlocks by default for 64-bit server CPUs.
 
 A rework of our PCI probing so that it happens later in boot, when more generic
 infrastructure is available.
 
 Two small fixes to allow 32-bit little-endian processes to run on 64-bit
 kernels.
 
 Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira
   Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy,
   Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh
   Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus
   Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
   O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan
   Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng
   Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAzMagTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAbBD/wMS2g1Q9oAGZPsx2NGd2RoeAauGxUs
 Yj6cZVmR+oa6sJyFYgEG7dT7tcwJITQxLBD3HpsHSnJ/rLrMloE33+cZNA9c4STz
 0mlzm3R7M5pOgcEqZglsgLP0RQeUuHSSF01g0kf1N3r+HYtmbmPjuUIl8CnAjlbT
 iMD2ZN2p8/r3kDDht0iBO534HUpsqhc00duSZgQhsV/PR7ZWVxoPk7PEJeo4vXlJ
 77986F7J5NLUTjMiLv5lTx49FcPbRd7a1jubsBtahJrwXj2GVvuy2i86G7HY+a+B
 eSxN7zJQgaFeLo0YPo7fZLBI0MAsIQt3nnZhKX0TMglbv/K8Aq64xiJqsVQdJ883
 CeEt0HvSJhsSC0C4O595NEINfDhDd+5IeSF9MvsujYXiUKRXtRkm1EPuAzTcZIzW
 NwkCLRo33NMXa+khMKaiqF/g7INayPUXoWESx75NXFsuNfcORvstkeUuEoi5GwJo
 TSlmosFqwRjghQ8eTLZuWBzmh3EpPGdtC4gm6D+lbzhzjah5c/1whyuLqra275kK
 E3Qt0/V0ixKyvlG7MI5yYh3L7+R/hrsflH7xIJJxZp2DW6mwBJzQYmkxDbSS8PzK
 nWien2XgpIQhSFat3QqreEFSfNkzdN2MClVi2Y1hpAgi+2Zm9rPdPNGcQI+DSOsB
 kpJkjOjWNJU/PQ==
 =dB2S
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - A large series adding wrappers for our interrupt handlers, so that
   irq/nmi/user tracking can be isolated in the wrappers rather than
   spread in each handler.

 - Conversion of the 32-bit syscall handling into C.

 - A series from Nick to streamline our TLB flushing when using the
   Radix MMU.

 - Switch to using queued spinlocks by default for 64-bit server CPUs.

 - A rework of our PCI probing so that it happens later in boot, when
   more generic infrastructure is available.

 - Two small fixes to allow 32-bit little-endian processes to run on
   64-bit kernels.

 - Other smaller features, fixes & cleanups.

Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang
Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian
Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong,
Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu,
Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen
Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun.

* tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits)
  powerpc/perf: Adds support for programming of Thresholding in P10
  powerpc/pci: Remove unimplemented prototypes
  powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
  powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
  powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
  powerpc/64: Fix stack trace not displaying final frame
  powerpc/time: Remove get_tbl()
  powerpc/time: Avoid using get_tbl()
  spi: mpc52xx: Avoid using get_tbl()
  powerpc/syscall: Avoid storing 'current' in another pointer
  powerpc/32: Handle bookE debugging in C in syscall entry/exit
  powerpc/syscall: Do not check unsupported scv vector on PPC32
  powerpc/32: Remove the counter in global_dbcr0
  powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
  powerpc/syscall: implement system call entry/exit logic in C for PPC32
  powerpc/32: Always save non volatile GPRs at syscall entry
  powerpc/syscall: Change condition to check MSR_RI
  powerpc/syscall: Save r3 in regs->orig_r3
  powerpc/syscall: Use is_compat_task()
  powerpc/syscall: Make interrupt.c buildable on PPC32
  ...
2021-02-22 14:34:00 -08:00
Nicholas Piggin
b1b1697ae0 KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
This reverts much of commit c01015091a ("KVM: PPC: Book3S HV: Run HPT
guests on POWER9 radix hosts"), which was required to run HPT guests on
RPT hosts on early POWER9 CPUs without support for "mixed mode", which
meant the host could not run with MMU on while guests were running.

This code has some corner case bugs, e.g., when the guest hits a machine
check or HMI the primary locks up waiting for secondaries to switch LPCR
to host, which they never do. This could all be fixed in software, but
most CPUs in production have mixed mode support, and those that don't
are believed to be all in installations that don't use this capability.
So simplify things and remove support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Nicholas Piggin
3a96570ffc powerpc: convert interrupt handlers to use wrappers
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
2021-02-09 00:02:12 +11:00
Aneesh Kumar K.V
e80639405c powerpc/mm: Update tlbiel loop on POWER10
With POWER10, single tlbiel instruction invalidates all the congruence
class of the TLB and hence we need to issue only one tlbiel with SET=0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
2020-11-19 14:50:15 +11:00
Mike Rapoport
04ba0a923f KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
Patch series "memblock: seasonal cleaning^w cleanup", v3.

These patches simplify several uses of memblock iterators and hide some of
the memblock implementation details from the rest of the system.

This patch (of 17):

The memory size calculation in kvm_cma_reserve() traverses memblock.memory
rather than simply call memblock_phys_mem_size().  The comment in that
function suggests that at some point there should have been call to
memblock_analyze() before memblock_phys_mem_size() could be used.  As of
now, there is no memblock_analyze() at all and memblock_phys_mem_size()
can be used as soon as cold-plug memory is registered with memblock.

Replace loop over memblock.memory with a call to memblock_phys_mem_size().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://lkml.kernel.org/r/20200818151634.14343-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20200818151634.14343-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:34 -07:00
Aneesh Kumar K.V
a5a8b258da powerpc/kvm/cma: Improve kernel log during boot
Current kernel gives:

[    0.000000] cma: Reserved 26224 MiB at 0x0000007959000000
[    0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node
[    0.000000] cma: Reserved 16384 MiB at 0x0000001800000000

With the fix

[    0.000000] kvm_cma_reserve: reserving 26214 MiB for global area
[    0.000000] cma: Reserved 26224 MiB at 0x0000007959000000
[    0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node
[    0.000000] cma: Reserved 16384 MiB at 0x0000001800000000

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713150749.25245-2-aneesh.kumar@linux.ibm.com
2020-07-29 21:09:37 +10:00
Nicholas Piggin
6a13cb0c37 KVM: PPC: Book3S HV: Implement LPCR[AIL]=3 mode for injected interrupts
kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which
can result in the guest receiving interrupts as if LPCR[AIL]=0
contrary to the ISA.

In practice, Linux guests cope with this deviation, but it should be
fixed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22 16:29:02 +11:00
Nicholas Piggin
268f4ef995 KVM: PPC: Book3S HV: Reuse kvmppc_inject_interrupt for async guest delivery
This consolidates the HV interrupt delivery logic into one place.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22 16:29:02 +11:00
Linus Torvalds
192f0f8e9d powerpc updates for 5.3
Notable changes:
 
  - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well
    as some other functions only used by drivers that haven't (yet?) made it
    upstream.
 
  - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e
    mem: ...) which could lead to register corruption and kernel crashes.
 
  - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc
    when using the Radix MMU.
 
  - A large but incremental rewrite of our exception handling code to use gas
    macros rather than multiple levels of nested CPP macros.
 
 And the usual small fixes, cleanups and improvements.
 
 Thanks to:
   Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju
   T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater,
   Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig,
   Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R.
   Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg
   Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
   Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao,
   Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria,
   Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun
   Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung
   Bauermann, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdKVoLAAoJEFHr6jzI4aWA0kIP/A6shIbbE7H5W2hFrqt/PPPK
 3+VrvPKbOFF+W6hcE/RgSZmEnUo0svdNjHUd/eMfFS1vb/uRt2QDdrsHUNNwURQL
 M2mcLXFwYpnjSjb/XMgDbHpAQxjeGfTdYLonUIejN7Rk8KQUeLyKQ3SBn6kfMc46
 DnUUcPcjuRGaETUmVuZZ4e40ZWbJp8PKDrSJOuUrTPXMaK5ciNbZk5mCWXGbYl6G
 BMQAyv4ld/417rNTjBEP/T2foMJtioAt4W6mtlgdkOTdIEZnFU67nNxDBthNSu2c
 95+I+/sML4KOp1R4yhqLSLIDDbc3bg3c99hLGij0d948z3bkSZ8bwnPaUuy70C4v
 U8rvl/+N6C6H3DgSsPE/Gnkd8DnudqWY8nULc+8p3fXljGwww6/Qgt+6yCUn8BdW
 WgixkSjKgjDmzTw8trIUNEqORrTVle7cM2hIyIK2Q5T4kWzNQxrLZ/x/3wgoYjUa
 1KwIzaRo5JKZ9D3pJnJ5U+knE2/90rJIyfcp0W6ygyJsWKi2GNmq1eN3sKOw0IxH
 Tg86RENIA/rEMErNOfP45sLteMuTR7of7peCG3yumIOZqsDVYAzerpvtSgip2cvK
 aG+9HcYlBFOOOF9Dabi8GXsTBLXLfwiyjjLSpA9eXPwW8KObgiNfTZa7ujjTPvis
 4mk9oukFTFUpfhsMmI3T
 =3dBZ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver,
     as well as some other functions only used by drivers that haven't
     (yet?) made it upstream.

   - A fix for a bug in our handling of hardware watchpoints (eg. perf
     record -e mem: ...) which could lead to register corruption and
     kernel crashes.

   - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for
     vmalloc when using the Radix MMU.

   - A large but incremental rewrite of our exception handling code to
     use gas macros rather than multiple levels of nested CPP macros.

  And the usual small fixes, cleanups and improvements.

  Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab,
  Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann,
  Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe
  Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis
  Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert
  Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz,
  Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
  Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N.
  Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi
  Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher
  Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj
  Jitindar Singh, Thiago Jung Bauermann, YueHaibing"

* tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits)
  powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
  powerpc/eeh: Handle hugepages in ioremap space
  ocxl: Update for AFU descriptor template version 1.1
  powerpc/boot: pass CONFIG options in a simpler and more robust way
  powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
  powerpc/irq: Don't WARN continuously in arch_local_irq_restore()
  powerpc/module64: Use symbolic instructions names.
  powerpc/module32: Use symbolic instructions names.
  powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h
  powerpc/module64: Fix comment in R_PPC64_ENTRY handling
  powerpc/boot: Add lzo support for uImage
  powerpc/boot: Add lzma support for uImage
  powerpc/boot: don't force gzipped uImage
  powerpc/8xx: Add microcode patch to move SMC parameter RAM.
  powerpc/8xx: Use IO accessors in microcode programming.
  powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c
  powerpc/8xx: refactor programming of microcode CPM params.
  powerpc/8xx: refactor printing of microcode patch name.
  powerpc/8xx: Refactor microcode write
  powerpc/8xx: refactor writing of CPM microcode arrays
  ...
2019-07-13 16:08:36 -07:00
Nicholas Piggin
6c46fcce39 powerpc/64s/radix: keep kernel ERAT over local process/guest invalidates
ISA v3.0 radix modes provide SLBIA variants which can invalidate ERAT
for effPID!=0 or for effLPID!=0, which allows user and guest
invalidations to retain kernel/host ERAT entries.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03 15:19:35 +10:00
Nicholas Piggin
fe7946ce08 powerpc/64s: Rename PPC_INVALIDATE_ERAT to PPC_ISA_3_0_INVALIDATE_ERAT
This makes it clear to the caller that it can only be used on POWER9
and later CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use "ISA_3_0" rather than "ARCH_300"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03 15:19:35 +10:00
Linus Torvalds
a8282bf087 powerpc fixes for 5.2 #5
Seven fixes, all for bugs introduced this cycle.
 
 The commit to add KASAN support broke booting on 32-bit SMP machines, due to a
 refactoring that moved some setup out of the secondary CPU path.
 
 A fix for another 32-bit SMP bug introduced by the fast syscall entry
 implementation for 32-bit BOOKE. And a build fix for the same commit.
 
 Our change to allow the DAWR to be force enabled on Power9 introduced a bug in
 KVM, where we clobber r3 leading to a host crash.
 
 The same commit also exposed a previously unreachable bug in the nested KVM
 handling of DAWR, which could lead to an oops in a nested host.
 
 One of the DMA reworks broke the b43legacy WiFi driver on some people's
 powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
 
 A fix for TLB flushing in KVM introduced a new bug, as it neglected to also
 flush the ERAT, this could lead to memory corruption in the guest.
 
 Thanks to:
   Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry Finger, Michael
   Neuling, Suraj Jitindar Singh.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdDg4YAAoJEFHr6jzI4aWACKYP/RG1cqYDjEWz0N9bxjAOanx6
 z//hPZqZrObEORx0mek07LNj6JDy4eL7CB9WaEudJjHt7mYugLYq0g7hUMVvBWnB
 irFEuzGJ8EgWl1aMbmz+fgf49PBIuroy2o/4pyzzQXoDaw44QyUaCke2VEBskQNG
 RW64C2rDVrPgpRHzBB9EZVNv7svmo6ERJsEpRvqP3PZG1ZxgXW+DXbEdSmJCcgAt
 8oI+z6frRv+0ez+nge7TULo8DuheShfxc7l0jFrd48i35v2qB/IowPr8cof9fRwM
 TqnB+3dZXHPKPz6J9mz80p9ZDe1omLzg6i9EiR2/7a3XGpRBo7kCg3Iri7N5pu0j
 LotK9l1+mXWLy5P6lOHH5/tEHv52Wqsvh5IetpNJ2tgXp3MzbOc1/Ut9h7Ag7cRw
 WRa7tNXQ5Ud8uPM1Pds8Ymhd+/nZ9RItjGcu6S095/OGpM1FJR9a0QnfUHMyfyuX
 kAGrJDWcAkCd/Q9tKHsQotuZAFmRCQe4JFkzTiGzwdjYWYgtTA1c/eIv3+SG7eLV
 1dsaIYzIS56b+Qz2Qc/pKHwho+I9o505Y7LFXxlCGXDDjyI72ioTQDwiSBzaZdc9
 ORwNchLfpXNpiNXRoRqAnqmhWxavYmA6oJ13RDBiMBxIUWHynVbEzLlX9fPNdBFj
 Kw3Zd15znokXBzU+1mDE
 =Ju1y
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "This is a frustratingly large batch at rc5. Some of these were sent
  earlier but were missed by me due to being distracted by other things,
  and some took a while to track down due to needing manual bisection on
  old hardware. But still we clearly need to improve our testing of KVM,
  and of 32-bit, so that we catch these earlier.

  Summary: seven fixes, all for bugs introduced this cycle.

   - The commit to add KASAN support broke booting on 32-bit SMP
     machines, due to a refactoring that moved some setup out of the
     secondary CPU path.

   - A fix for another 32-bit SMP bug introduced by the fast syscall
     entry implementation for 32-bit BOOKE. And a build fix for the same
     commit.

   - Our change to allow the DAWR to be force enabled on Power9
     introduced a bug in KVM, where we clobber r3 leading to a host
     crash.

   - The same commit also exposed a previously unreachable bug in the
     nested KVM handling of DAWR, which could lead to an oops in a
     nested host.

   - One of the DMA reworks broke the b43legacy WiFi driver on some
     people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.

   - A fix for TLB flushing in KVM introduced a new bug, as it neglected
     to also flush the ERAT, this could lead to memory corruption in the
     guest.

  Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry
  Finger, Michael Neuling, Suraj Jitindar Singh"

* tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
  powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
  KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
  KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
  powerpc/32: fix build failure on book3e with KVM
  powerpc/booke: fix fast syscall entry on SMP
  powerpc/32s: fix initial setup of segment registers on secondary CPU
2019-06-22 09:09:42 -07:00
Suraj Jitindar Singh
5008711259 KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
When a guest vcpu moves from one physical thread to another it is
necessary for the host to perform a tlb flush on the previous core if
another vcpu from the same guest is going to run there. This is because the
guest may use the local form of the tlb invalidation instruction meaning
stale tlb entries would persist where it previously ran. This is handled
on guest entry in kvmppc_check_need_tlb_flush() which calls
flush_guest_tlb() to perform the tlb flush.

Previously the generic radix__local_flush_tlb_lpid_guest() function was
used, however the functionality was reimplemented in flush_guest_tlb()
to avoid the trace_tlbie() call as the flushing may be done in real
mode. The reimplementation in flush_guest_tlb() was missing an erat
invalidation after flushing the tlb.

This lead to observable memory corruption in the guest due to the
caching of stale translations. Fix this by adding the erat invalidation.

Fixes: 70ea13f6e6 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-20 22:11:25 +10:00
Thomas Gleixner
d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Paul Mackerras
70ea13f6e6 KVM: PPC: Book3S HV: Flush TLB on secondary radix threads
When running on POWER9 with kvm_hv.indep_threads_mode = N and the host
in SMT1 mode, KVM will run guest VCPUs on offline secondary threads.
If those guests are in radix mode, we fail to load the LPID and flush
the TLB if necessary, leading to the guest crashing with an
unsupported MMU fault.  This arises from commit 9a4506e11b ("KVM:
PPC: Book3S HV: Make radix handle process scoped LPID flush in C,
with relocation on", 2018-05-17), which didn't consider the case
where indep_threads_mode = N.

For simplicity, this makes the real-mode guest entry path flush the
TLB in the same place for both radix and hash guests, as we did before
9a4506e11b, though the code is now C code rather than assembly code.
We also have the radix TLB flush open-coded rather than calling
radix__local_flush_tlb_lpid_guest(), because the TLB flush can be
called in real mode, and in real mode we don't want to invoke the
tracepoint code.

Fixes: 9a4506e11b ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30 19:32:12 +10:00
Paul Mackerras
2940ba0c48 KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C code
This replaces assembler code in book3s_hv_rmhandlers.S that checks
the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush
with C code in book3s_hv_builtin.c.  Note that unlike the radix
version, the hash version doesn't do an explicit ERAT invalidation
because we will invalidate and load up the SLB before entering the
guest, and that will invalidate the ERAT.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30 19:32:01 +10:00
Paul Mackerras
03f953329b KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE
Currently, the KVM code assumes that if the host kernel is using the
XIVE interrupt controller (the new interrupt controller that first
appeared in POWER9 systems), then the in-kernel XICS emulation will
use the XIVE hardware to deliver interrupts to the guest.  However,
this only works when the host is running in hypervisor mode and has
full access to all of the XIVE functionality.  It doesn't work in any
nested virtualization scenario, either with PR KVM or nested-HV KVM,
because the XICS-on-XIVE code calls directly into the native-XIVE
routines, which are not initialized and cannot function correctly
because they use OPAL calls, and OPAL is not available in a guest.

This means that using the in-kernel XICS emulation in a nested
hypervisor that is using XIVE as its interrupt controller will cause a
(nested) host kernel crash.  To fix this, we change most of the places
where the current code calls xive_enabled() to select between the
XICS-on-XIVE emulation and the plain XICS emulation to call a new
function, xics_on_xive(), which returns false in a guest.

However, there is a further twist.  The plain XICS emulation has some
functions which are used in real mode and access the underlying XICS
controller (the interrupt controller of the host) directly.  In the
case of a nested hypervisor, this means doing XICS hypercalls
directly.  When the nested host is using XIVE as its interrupt
controller, these hypercalls will fail.  Therefore this also adds
checks in the places where the XICS emulation wants to access the
underlying interrupt controller directly, and if that is XIVE, makes
the code use the virtual mode fallback paths, which call generic
kernel infrastructure rather than doing direct XICS access.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-19 16:00:15 +11:00
Paul Mackerras
f3c18e9342 KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor
This adds code to call the H_IPI and H_EOI hypercalls when we are
running as a nested hypervisor (i.e. without the CPU_FTR_HVMODE cpu
feature) and we would otherwise access the XICS interrupt controller
directly or via an OPAL call.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09 16:04:27 +11:00
Paul Mackerras
f7035ce9f1 KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code
This is based on a patch by Suraj Jitindar Singh.

This moves the code in book3s_hv_rmhandlers.S that generates an
external, decrementer or privileged doorbell interrupt just before
entering the guest to C code in book3s_hv_builtin.c.  This is to
make future maintenance and modification easier.  The algorithm
expressed in the C code is almost identical to the previous
algorithm.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09 16:04:27 +11:00
Marek Szyprowski
6518202970 mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
cma_alloc() doesn't really support gfp flags other than __GFP_NOWARN, so
convert gfp_mask parameter to boolean no_warn parameter.

This will help to avoid giving false feeling that this function supports
standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
what has already been an issue: see commit dd65a941f6 ("arm64:
dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").

Link: http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@eucas1p2.samsung.com
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michał Nazarewicz <mina86@mina86.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:32 -07:00
Nicholas Piggin
7c1bd80cc2 KVM: PPC: Book3S HV: Send kvmppc_bad_interrupt NMIs to Linux handlers
It's possible to take a SRESET or MCE in these paths due to a bug
in the host code or a NMI IPI, etc. A recent bug attempting to load
a virtual address from real mode gave th complete but cryptic error,
abridged:

      Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
      LE SMP NR_CPUS=2048 NUMA PowerNV
      CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted
      NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
      REGS: c000000fff76dd80 TRAP: 0200   Not tainted
      MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
      CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
      NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
      LR [c0000000000c2430] do_tlbies+0x230/0x2f0

Sending the NMIs through the Linux handlers gives a nicer output:

      Severe Machine check interrupt [Not recovered]
        NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0
        Initiator: CPU
        Error type: Real address [Load (bad)]
          Effective address: d00017fffcc01a28
      opal: Machine check interrupt unrecoverable: MSR(RI=0)
      opal: Hardware platform error: Unrecoverable Machine Check exception
      CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G   M
      NIP:  c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580
      REGS: c000000fff9e9d80 TRAP: 0200   Tainted: G   M
      MSR:  9000000000201001 <SF,HV,ME,LE>  CR: 48082222  XER: 00000000
      CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3
      NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
      LR [c0000000000c23c0] do_tlbies+0x1c0/0x280

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18 15:38:23 +10:00
Simon Guo
1143a70665 KVM: PPC: Add pt_regs into kvm_vcpu_arch and move vcpu->arch.gpr[] into it
Current regs are scattered at kvm_vcpu_arch structure and it will
be more neat to organize them into pt_regs structure.

Also it will enable reimplementation of MMIO emulation code with
analyse_instr() later.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18 15:38:23 +10:00
Nicholas Piggin
d2e60075a3 powerpc/64: Use array of paca pointers and allocate pacas individually
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:23 +11:00
Paul Mackerras
c01015091a KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts
This patch removes the restriction that a radix host can only run
radix guests, allowing us to run HPT (hashed page table) guests as
well.  This is useful because it provides a way to run old guest
kernels that know about POWER8 but not POWER9.

Unfortunately, POWER9 currently has a restriction that all threads
in a given code must either all be in HPT mode, or all in radix mode.
This means that when entering a HPT guest, we have to obtain control
of all 4 threads in the core and get them to switch their LPIDR and
LPCR registers, even if they are not going to run a guest.  On guest
exit we also have to get all threads to switch LPIDR and LPCR back
to host values.

To make this feasible, we require that KVM not be in the "independent
threads" mode, and that the CPU cores be in single-threaded mode from
the host kernel's perspective (only thread 0 online; threads 1, 2 and
3 offline).  That allows us to use the same code as on POWER8 for
obtaining control of the secondary threads.

To manage the LPCR/LPIDR changes required, we extend the kvm_split_info
struct to contain the information needed by the secondary threads.
All threads perform a barrier synchronization (where all threads wait
for every other thread to reach the synchronization point) on guest
entry, both before and after loading LPCR and LPIDR.  On guest exit,
they all once again perform a barrier synchronization both before
and after loading host values into LPCR and LPIDR.

Finally, it is also currently necessary to flush the entire TLB every
time we enter a HPT guest on a radix host.  We do this on thread 0
with a loop of tlbiel instructions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01 15:36:41 +11:00
Paul Mackerras
00bb6ae500 KVM: PPC: Book3S HV: Don't call real-mode XICS hypercall handlers if not enabled
When running a guest on a POWER9 system with the in-kernel XICS
emulation disabled (for example by running QEMU with the parameter
"-machine pseries,kernel_irqchip=off"), the kernel does not pass
the XICS-related hypercalls such as H_CPPR up to userspace for
emulation there as it should.

The reason for this is that the real-mode handlers for these
hypercalls don't check whether a XICS device has been instantiated
before calling the xics-on-xive code.  That code doesn't check
either, leading to potential NULL pointer dereferences because
vcpu->arch.xive_vcpu is NULL.  Those dereferences won't cause an
exception in real mode but will lead to kernel memory corruption.

This fixes it by adding kvmppc_xics_enabled() checks before calling
the XICS functions.

Cc: stable@vger.kernel.org # v4.11+
Fixes: 5af5099385 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01 15:09:32 +11:00
Paul Mackerras
857b99e140 KVM: PPC: Book3S HV: Handle unexpected interrupts better
At present, if an interrupt (i.e. an exception or trap) occurs in the
code where KVM is switching the MMU to or from guest context, we jump
to kvmppc_bad_host_intr, where we simply spin with interrupts disabled.
In this situation, it is hard to debug what happened because we get no
indication as to which interrupt occurred or where.  Typically we get
a cascade of stall and soft lockup warnings from other CPUs.

In order to get more information for debugging, this adds code to
create a stack frame on the emergency stack and save register values
to it.  We start half-way down the emergency stack in order to give
ourselves some chance of being able to do a stack trace on secondary
threads that are already on the emergency stack.

On POWER7 or POWER8, we then just spin, as before, because we don't
know what state the MMU context is in or what other threads are doing,
and we can't switch back to host context without coordinating with
other threads.  On POWER9 we can do better; there we load up the host
MMU context and jump to C code, which prints an oops message to the
console and panics.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-10-14 13:35:51 +11:00
Paul Mackerras
898b25b202 KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
Since commit b009031f74 ("KVM: PPC: Book3S HV: Take out virtual
core piggybacking code", 2016-09-15), we only have at most one
vcore per subcore.  Previously, the fact that there might be more
than one vcore per subcore meant that we had the notion of a
"master vcore", which was the vcore that controlled thread 0 of
the subcore.  We also needed a list per subcore in the core_info
struct to record which vcores belonged to each subcore.  Now that
there can only be one vcore in the subcore, we can replace the
list with a simple pointer and get rid of the notion of the
master vcore (and in fact treat every vcore as a master vcore).

We can also get rid of the subcore_vm[] field in the core_info
struct since it is never read.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-07-01 18:59:01 +10:00
Paul Mackerras
acde25726b KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers
POWER9 running a radix guest will take some hypervisor interrupts
without going to real mode (turning off the MMU).  This means that
early hypercall handlers may now be called in virtual mode.  Most of
the handlers work just fine in both modes, but there are some that
can crash the host if called in virtual mode, notably the TCE (IOMMU)
hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT.  These
already have both a real-mode and a virtual-mode version, so we
arrange for the real-mode version to return H_TOO_HARD for radix
guests, which will result in the virtual-mode version being called.

The other hypercall which is sensitive to the MMU mode is H_RANDOM.
It doesn't have a virtual-mode version, so this adds code to enable
it to be called in either mode.

An alternative solution was considered which would refuse to call any
of the early hypercall handlers when doing a virtual-mode exit from a
radix guest.  However, the XICS-on-XIVE code depends on the XICS
hypercalls being handled early even for virtual-mode exits, because
the handlers need to be called before the XIVE vCPU state has been
pulled off the hardware.  Therefore that solution would have become
quite invasive and complicated, and was rejected in favour of the
simpler, though less elegant, solution presented here.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-05-12 15:08:09 +10:00
Paolo Bonzini
4415b33528 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
	arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
	arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-05-09 11:50:01 +02:00
Linus Torvalds
c6a677c6f3 Staging/IIO patches for 4.12-rc1
Here is the big staging tree update for 4.12-rc1.  And it's a big one,
 adding about 350k new lines of crap^Wcode, mostly all in a big dump of
 media drivers from Intel.  But there's other new drivers in here as
 well, yet-another-wifi driver, new IIO drivers, and a new crypto
 accelerator.  We also deleted a bunch of stuff, mostly in patch
 cleanups, but also the Android ION code has shrunk a lot, and the
 Android low memory killer driver was finally deleted, much to the
 celebration of the -mm developers.
 
 All of these have been in linux-next with a few build issues that will
 show up when you merge to your tree, I'll follow up with fixes for those
 after this gets merged.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWQzzlQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylNMgCcD+GoaF/Ml7YnULRl2GG/526II78AnitZ8qjd
 rPqeowMIewYu9fgckLUc
 =7rzO
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO updates from Greg KH:
 "Here is the big staging tree update for 4.12-rc1.

  It's a big one, adding about 350k new lines of crap^Wcode, mostly all
  in a big dump of media drivers from Intel. But there's other new
  drivers in here as well, yet-another-wifi driver, new IIO drivers, and
  a new crypto accelerator.

  We also deleted a bunch of stuff, mostly in patch cleanups, but also
  the Android ION code has shrunk a lot, and the Android low memory
  killer driver was finally deleted, much to the celebration of the -mm
  developers.

  All of these have been in linux-next with a few build issues that will
  show up when you merge to your tree"

Merge conflicts in the new rtl8723bs driver (due to the wifi changes
this merge window) handled as per linux-next, courtesy of Stephen
Rothwell.

* tag 'staging-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1182 commits)
  staging: fsl-mc/dpio: add cpu <--> LE conversion for dpaa2_fd
  staging: ks7010: remove line continuations in quoted strings
  staging: vt6656: use tabs instead of spaces
  staging: android: ion: Fix unnecessary initialization of static variable
  staging: media: atomisp: fix range checking on clk_num
  staging: media: atomisp: fix misspelled word in comment
  staging: media: atomisp: kmap() can't fail
  staging: atomisp: remove #ifdef for runtime PM functions
  staging: atomisp: satm include directory is gone
  atomisp: remove some more unused files
  atomisp: remove hmm_load/store/clear indirections
  atomisp: kill off mmgr_free
  atomisp: clean up the hmm init/cleanup indirections
  atomisp: handle allocation calls before init in the hmm layer
  staging: fsl-dpaa2/eth: Add maintainer for Ethernet driver
  staging: fsl-dpaa2/eth: Add TODO file
  staging: fsl-dpaa2/eth: Add trace points
  staging: fsl-dpaa2/eth: Add driver specific stats
  staging: fsl-dpaa2/eth: Add ethtool support
  staging: fsl-dpaa2/eth: Add Freescale DPAA2 Ethernet driver
  ...
2017-05-05 18:16:23 -07:00
Benjamin Herrenschmidt
5af5099385 KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller
This patch makes KVM capable of using the XIVE interrupt controller
to provide the standard PAPR "XICS" style hypercalls. It is necessary
for proper operations when the host uses XIVE natively.

This has been lightly tested on an actual system, including PCI
pass-through with a TG3 device.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build
 failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and
 adding empty stubs for the kvm_xive_xxx() routines, fixup subject,
 integrate fixes from Paul for building PR=y HV=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-27 21:37:29 +10:00
Laura Abbott
f318dd083c cma: Store a name in the cma structure
Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it
useful to have an explicit name attached to each region. Store the name
in each CMA structure.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-18 20:41:12 +02:00
Benjamin Herrenschmidt
d381d7caf8 powerpc: Consolidate variants of real-mode MMIOs
We have all sort of variants of MMIO accessors for the real mode
instructions. This creates a clean set of accessors based on
Linux normal naming conventions, replacing all occurrences of
the old ones in the tree.

I have purposefully removed the "out/in" variants in favor of
only including __raw variants. Any code using these is already
pretty much hand tuned to operate in a very specific environment.
I've fixed up the 2 users (only one of them actually needed
a barrier in the first place).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:43:16 +10:00
Benjamin Herrenschmidt
243e25112d powerpc/xive: Native exploitation of the XIVE interrupt controller
The XIVE interrupt controller is the new interrupt controller
found in POWER9. It supports advanced virtualization capabilities
among other things.

Currently we use a set of firmware calls that simulate the old
"XICS" interrupt controller but this is fairly inefficient.

This adds the framework for using XIVE along with a native
backend which OPAL for configuration. Later, a backend allowing
the use in a KVM or PowerVM guest will also be provided.

This disables some fast path for interrupts in KVM when XIVE is
enabled as these rely on the firmware emulation code which is no
longer available when the XIVE is used natively by Linux.

A latter patch will make KVM also directly exploit the XIVE, thus
recovering the lost performance (and more).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings,
 tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV,
 fix build errors when SMP=n, fold in fixes from Ben:
   Don't call cpu_online() on an invalid CPU number
   Fix irq target selection returning out of bounds cpu#
   Extra sanity checks on cpu numbers
 ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:41:34 +10:00
Lucas Stach
e2f466e32f mm: cma_alloc: allow to specify GFP mask
Most users of this interface just want to use it with the default
GFP_KERNEL flags, but for cases where DMA memory is allocated it may be
called from a different context.

No functional change yet, just passing through the flag to the
underlying alloc_contig_range function.

Link: http://lkml.kernel.org/r/20170127172328.18574-2-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Paul Mackerras
a4a741a048 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in a fix which touches both PPC and KVM code,
which was therefore put into a topic branch in the powerpc
tree.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-08 19:35:34 +11:00
Benjamin Herrenschmidt
ab9bad0ead powerpc/powernv: Remove separate entry for OPAL real mode calls
All entry points already read the MSR so they can easily do
the right thing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:40:18 +11:00
David Gibson
db9a290d9c KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarity
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:23 +11:00
Paul Mackerras
53af3ba2e8 KVM: PPC: Book3S HV: Allow guest exit path to have MMU on
If we allow LPCR[AIL] to be set for radix guests, then interrupts from
the guest to the host can be delivered by the hardware with relocation
on, and thus the code path starting at kvmppc_interrupt_hv can be
executed in virtual mode (MMU on) for radix guests (previously it was
only ever executed in real mode).

Most of the code is indifferent to whether the MMU is on or off, but
the calls to OPAL that use the real-mode OPAL entry code need to
be switched to use the virtual-mode code instead.  The affected
calls are the calls to the OPAL XICS emulation functions in
kvmppc_read_one_intr() and related functions.  We test the MSR[IR]
bit to detect whether we are in real or virtual mode, and call the
opal_rm_* or opal_* function as appropriate.

The other place that depends on the MMU being off is the optimization
where the guest exit code jumps to the external interrupt vector or
hypervisor doorbell interrupt vector, or returns to its caller (which
is __kvmppc_vcore_entry).  If the MMU is on and we are returning to
the caller, then we don't need to use an rfid instruction since the
MMU is already on; a simple blr suffices.  If there is an external
or hypervisor doorbell interrupt to handle, we branch to the
relocation-on version of the interrupt vector.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:51 +11:00
Paul Mackerras
e34af78490 KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h
This moves the prototypes for functions that are only called from
assembler code out of asm/asm-prototypes.h into asm/kvm_ppc.h.
The prototypes were added in commit ebe4535fbe ("KVM: PPC:
Book3S HV: sparse: prototypes for functions called from assembler",
2016-10-10), but given that the functions are KVM functions,
having them in a KVM header will be better for long-term
maintenance.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-12-01 14:03:46 +11:00
Paul Mackerras
e2702871b4 KVM: PPC: Book3S HV: Fix compilation with unusual configurations
This adds the "again" parameter to the dummy version of
kvmppc_check_passthru(), so that it matches the real version.
This fixes compilation with CONFIG_BOOK3S_64_HV set but
CONFIG_KVM_XICS=n.

This includes asm/smp.h in book3s_hv_builtin.c to fix compilation
with CONFIG_SMP=n.  The explicit inclusion is necessary to provide
definitions of hard_smp_processor_id() and get_hard_smp_processor_id()
in UP configs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24 14:20:00 +11:00
Paul Mackerras
f725758b89 KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9
POWER9 includes a new interrupt controller, called XIVE, which is
quite different from the XICS interrupt controller on POWER7 and
POWER8 machines.  KVM-HV accesses the XICS directly in several places
in order to send and clear IPIs and handle interrupts from PCI
devices being passed through to the guest.

In order to make the transition to XIVE easier, OPAL firmware will
include an emulation of XICS on top of XIVE.  Access to the emulated
XICS is via OPAL calls.  The one complication is that the EOI
(end-of-interrupt) function can now return a value indicating that
another interrupt is pending; in this case, the XIVE will not signal
an interrupt in hardware to the CPU, and software is supposed to
acknowledge the new interrupt without waiting for another interrupt
to be delivered in hardware.

This adapts KVM-HV to use the OPAL calls on machines where there is
no XICS hardware.  When there is no XICS, we look for a device-tree
node with "ibm,opal-intc" in its compatible property, which is how
OPAL indicates that it provides XICS emulation.

In order to handle the EOI return value, kvmppc_read_intr() has
become kvmppc_read_one_intr(), with a boolean variable passed by
reference which can be set by the EOI functions to indicate that
another interrupt is pending.  The new kvmppc_read_intr() keeps
calling kvmppc_read_one_intr() until there are no more interrupts
to process.  The return value from kvmppc_read_intr() is the
largest non-zero value of the returns from kvmppc_read_one_intr().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24 09:24:23 +11:00
Paul Mackerras
1704a81cce KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9
On POWER9, the msgsnd instruction is able to send interrupts to
other cores, as well as other threads on the local core.  Since
msgsnd is generally simpler and faster than sending an IPI via the
XICS, we use msgsnd for all IPIs sent by KVM on POWER9.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24 09:24:23 +11:00