Commit graph

4948 commits

Author SHA1 Message Date
Al Viro
fd2d2b191f s390: get_user() should zero on failure
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13 17:50:13 -04:00
Paolo Bonzini
ad53e35ae5 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
Paul Mackerras writes:

    The highlights are:

    * Reduced latency for interrupts from PCI pass-through devices, from
      Suresh Warrier and me.
    * Halt-polling implementation from Suraj Jitindar Singh.
    * 64-bit VCPU statistics, also from Suraj.
    * Various other minor fixes and improvements.
2016-09-13 15:20:55 +02:00
Linus Torvalds
ac059c4fa7 * s390: nested virt fixes (new 4.8 feature)
* x86: fixes for 4.8 regressions
 * ARM: two small bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJX1xaBAAoJEL/70l94x66Db/8H/AuKWs6QbvUNnA+tWITdmFbi
 ah7R2BoRVak4h6UGYC4OsY9BTuWVeaCx2+8yLIqSdfWoYUJzwiGFPwkuUK/hVSkK
 /9gZO8SsREAm/1gVck2X2vzATYbfCxKcRsSq0/ocmIQEpRYWKGoJWS6zeiwLZ9kq
 H/KsAvsGc83VdlPXrhLVQa7js/Tl/M5JlBEbm6i8nIsvhoqZkES4rgwHKBtkxTyU
 8oepfNQnYTb2hZhJE0aW+9z3V0O+NKbGW75bKOzrbJBoEUGeHuPpLnP0mZIyEGPU
 grQkfuWMmfEaWrGahNA9ARpsNcy4Zp0BF5mgJ03OAmgfY+KmJroVk95IvcP9jF4=
 =+uuA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 - s390: nested virt fixes (new 4.8 feature)
 - x86: fixes for 4.8 regressions
 - ARM: two small bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm-arm: Unmap shadow pagetables properly
  x86, clock: Fix kvm guest tsc initialization
  arm: KVM: Fix idmap overlap detection when the kernel is idmap'ed
  KVM: lapic: adjust preemption timer correctly when goes TSC backward
  KVM: s390: vsie: fix riccbd
  KVM: s390: don't use current->thread.fpu.* when accessing registers
2016-09-12 14:30:14 -07:00
Christian Borntraeger
b0eb91ae63 Merge remote-tracking branch 'kvms390/s390forkvm' into kvms390next 2016-09-08 13:41:08 +02:00
Markus Elfring
0624a8eb82 KVM: s390: Use memdup_user() rather than duplicating code
* Reuse existing functionality from memdup_user() instead of keeping
  duplicate source code.

  This issue was detected by using the Coccinelle software.

* Return directly if this copy operation failed.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Message-Id: <c86f7520-885e-2829-ae9c-b81caa898e84@users.sourceforge.net>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 13:40:55 +02:00
Markus Elfring
a1708a2ead KVM: s390: Improve determination of sizes in kvm_s390_import_bp_data()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus reuse the corresponding function "kmalloc_array".

  Suggested-by: Paolo Bonzini <pbonzini@redhat.com>

  This issue was detected also by using the Coccinelle software.

* Replace the specification of data structures by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

* Delete the local variable "size" which became unnecessary with
  this refactoring.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <c3323f6b-4af2-0bfb-9399-e529952e378e@users.sourceforge.net>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 13:40:54 +02:00
David Hildenbrand
a6940674c3 KVM: s390: allow 255 VCPUs when sca entries aren't used
If the SCA entries aren't used by the hardware (no SIGPIF), we
can simply not set the entries, stick to the basic sca and allow more
than 64 VCPUs.

To hinder any other facility from using these entries, let's properly
provoke intercepts by not setting the MCN and keeping the entries
unset.

This effectively allows when running KVM under KVM (vSIE) or under z/VM to
provide more than 64 VCPUs to a guest. Let's limit it to 255 for now, to
not run into problems if the CPU numbers are limited somewhere else.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 13:40:53 +02:00
Fan Zhang
80cd876338 KVM: s390: lazy enable RI
Only enable runtime instrumentation if the guest issues an RI related
instruction or if userspace changes the riccb to a valid state.
This makes entry/exit a tiny bit faster.

Initial patch by Christian Borntraeger
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 13:40:39 +02:00
Janosch Frank
c14b88d766 KVM: s390: gaccess: simplify translation exception handling
The payload data for protection exceptions is a superset of the
payload of other translation exceptions. Let's set the additional
flags and use a fall through to minimize code duplication.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 09:07:53 +02:00
David Hildenbrand
b1ffffbd0f KVM: s390: guestdbg: separate defines for per code
Let's avoid working with the PER_EVENT* defines, used for control register
manipulation, when checking the u8 PER code. Introduce separate defines
based on the existing defines.

Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 09:07:52 +02:00
David Hildenbrand
8953fb08ab KVM: s390: write external damage code on machine checks
Let's also write the external damage code already provided by
struct kvm_s390_mchk_info.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 09:07:52 +02:00
David Hildenbrand
ff5dc1492a KVM: s390: fix delivery of vector regs during machine checks
Vector registers are only to be stored if the facility is available
and if the guest has set up the machine check extended save area.

If anything goes wrong while writing the vector registers, the vector
registers are to be marked as invalid. Please note that we are allowed
to write the registers although they are marked as invalid.

Machine checks and "store status" SIGP orders are two different concepts,
let's correctly separate these. As the SIGP part is completely handled in
user space, we can drop it.

This patch is based on a patch from Cornelia Huck.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 09:07:52 +02:00
David Hildenbrand
0319dae677 KVM: s390: split store status and machine check handling
Store status writes the prefix which is not to be done by a machine check.
Also, the psw is stored and later on overwritten by the failing-storage
address, which looks strange at first sight.

Store status and machine check handling look similar, but they are actually
two different things.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 09:07:51 +02:00
David Hildenbrand
d6404ded30 KVM: s390: factor out actual delivery of machine checks
Let's factor this out to prepare for bigger changes. Reorder to calls to
match the logical order given in the PoP.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-09-08 09:07:51 +02:00
Suraj Jitindar Singh
8a7e75d47b KVM: Add provisioning for ulong vm stats and u64 vcpu stats
vms and vcpus have statistics associated with them which can be viewed
within the debugfs. Currently it is assumed within the vcpu_stat_get() and
vm_stat_get() functions that all of these statistics are represented as
u32s, however the next patch adds some u64 vcpu statistics.

Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
Since vcpu statistics are per vcpu, they will only be updated by a single
vcpu at a time so this shouldn't present a problem on 32-bit machines
which can't atomically increment 64-bit numbers. However vm statistics
could potentially be updated by multiple vcpus from that vm at a time.
To avoid the overhead of atomics make all vm statistics ulong such that
they are 64-bit on 64-bit systems where they can be atomically incremented
and are 32-bit on 32-bit systems which may not be able to atomically
increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08 12:25:37 +10:00
Martin Schwidefsky
c783b91ebd s390: add assembler include path for vx-insn.h
With git commit 0eab11c7e0
"s390/vx: allow to include vx-insn.h with .include"
and an older gcc we get errors like this:

{standard input}:6: Error: can't open asm/vx-insn.h for reading:
No such file or directory
arch/s390/kernel/fpu.c:57: Error: Unrecognized opcode: `vstm'

To solve this issue simply add the path to arch/s390/include to
all assembler runs.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-09-06 11:00:04 +02:00
Colin Ian King
6512391a30 s390/crypto: avoid returning garbage value
Static analysis with cppcheck detected that ret is not initialized
and hence garbage is potentially being returned in the case where
prng_data->ppnows.reseed_counter <= prng_reseed_limit.

Thanks to Martin Schwidefsky for spotting a mistake in my original
fix.

Fixes: 0177db01ad ("s390/crypto: simplify return code handling")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-09-06 11:00:00 +02:00
Bhaktipriya Shridhar
e68f1d4ca9 s390: Remove deprecated create_singlethread_workqueue
The workqueue "appldata_wq" has been replaced with an ordered dedicated
workqueue.

WQ_MEM_RECLAIM has not been set since the workqueue is not being used on
a memory reclaim path.

The adapter->work_queue queues multiple work items viz
&adapter->scan_work, &port->rport_work, &adapter->ns_up_work,
&adapter->stat_work, adapter->work_queue, &adapter->events.work,
&port->gid_pn_work, &port->test_link_work. Hence, an ordered
dedicated workqueue has been used.

WQ_MEM_RECLAIM has been set to ensure forward progress under memory
pressure.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-09-06 10:59:57 +02:00
David Hildenbrand
4d21cef3ea KVM: s390: vsie: fix riccbd
We store the address of riccbd at the wrong location, overwriting
gvrd. This means that our nested guest will not be able to use runtime
instrumentation. Also, a memory leak, if our KVM guest actually sets gvrd.

Not noticed until now, as KVM guests never make use of gvrd and runtime
instrumentation wasn't completely tested yet.

Reported-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-09-05 13:48:50 +02:00
Josh Poimboeuf
0d025d271e mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:

1) "copy_from_user() buffer size is too small" compile warning/error

   This is a static warning which happens when object size and copy size
   are both const, and copy size > object size.  I didn't see any false
   positives for this one.  So the function warning attribute seems to
   be working fine here.

   Note this scenario is always a bug and so I think it should be
   changed to *always* be an error, regardless of
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct" compile warning

   This is another static warning which happens when I enable
   __compiletime_object_size() for new compilers (and
   CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
   is const, but copy size is *not*.  In this case there's no way to
   compare the two at build time, so it gives the warning.  (Note the
   warning is a byproduct of the fact that gcc has no way of knowing
   whether the overflow function will be called, so the call isn't dead
   code and the warning attribute is activated.)

   So this warning seems to only indicate "this is an unusual pattern,
   maybe you should check it out" rather than "this is a bug".

   I get 102(!) of these warnings with allyesconfig and the
   __compiletime_object_size() gcc check removed.  I don't know if there
   are any real bugs hiding in there, but from looking at a small
   sample, I didn't see any.  According to Kees, it does sometimes find
   real bugs.  But the false positive rate seems high.

3) "Buffer overflow detected" runtime warning

   This is a runtime warning where object size is const, and copy size >
   object size.

All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:

  2fb0815c9e ("gcc4: disable __compiletime_object_size for GCC 4.6+")

That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size().  But in fact,
__compiletime_object_size() seems to be working fine.  The false
positives were instead triggered by #2 above.  (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)

So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.

Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.

Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-30 10:10:21 -07:00
Paolo Bonzini
20b8f9e2dd KVM: s390: Fix for fpu register errors since 4.7
This fixes a regression that was introduced by a semantic
 change in commit 3f6813b9a5 ("s390/fpu: allocate 'struct
 fpu' with the task_struct"). Symptoms are broken host userspace
 fpu registers if the old FPU set/get ioctls are used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJXvxNAAAoJEBF7vIC1phx8LdsQAIrzq8fRarrNhOX595ur1Js/
 ygB4TkwismmCtSIuqqBOkxlTS2HyED4o3C3i38x4wYiQQtGtGaS0nZFZ8VF064Po
 c9//lGMtJXTp+xJDodWTQPgRC/wdaoEOEdKwfm0svd8DmqivRmgAFZ3o+olZVBDL
 QlE9JBDk04+3z1eOLD4vdO0F7z4R/U0DJNdDQnPKGakJETO6Y6kAublpA1p27UHC
 dktpQOle2dEYt9EpBiTmMFIOXkMt0ilaS5SwTjydRlXvrpCV3edtJSlvvJm7/KRc
 SMdLUqQjeR7Pcaxg13p6wm+lj6Isq6bsB+fylnMd94rNia+OgVgETzGPMXCqE5HY
 8DeWL5m4PN73Gx2HO+MDIYmYFtlwkoTMOLcVyk6yXeoM1cUHDBhNmLw2RzZTx/rZ
 ArqO8qk20zbggdvEhEAL7npsWVBbX2BsLRfw/ALxVh+EP4IjMsj609Ll78oieay5
 fE6idPHySlhZPqkTo6qu5spYGwsUanTzNGrAxYGEBJwt8lET80vqnqM6Oq2CSvHL
 8zqsORBWzOJyJUyq174z63s7Pi8qGQEMmYE0qN8hIqCNK0YMMYPKCenL/SHegcr0
 ohWyvDxbkgeW0MwsF75wuJQ6mV/nnYcZHonrkHSWcdXn1QWuJiK5lffYgJmcO/JS
 7ppOE1YBoHVbN131h0R2
 =S2yL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: Fix for fpu register errors since 4.7

This fixes a regression that was introduced by a semantic
change in commit 3f6813b9a5 ("s390/fpu: allocate 'struct
fpu' with the task_struct"). Symptoms are broken host userspace
fpu registers if the old FPU set/get ioctls are used.
2016-08-30 14:11:33 +02:00
Martin Schwidefsky
7bac4f5b8e s390/crypto: simplify CPACF encryption / decryption functions
The double while loops of the CTR mode encryption / decryption functions
are overly complex for little gain. Simplify the functions to a single
while loop at the cost of an additional memcpy of a few bytes for every
4K page worth of data.
Adapt the other crypto functions to make them all look alike.

Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:11 +02:00
Martin Schwidefsky
69c0e360f9 s390/crypto: cpacf function detection
The CPACF code makes some assumptions about the availablity of hardware
support. E.g. if the machine supports KM(AES-256) without chaining it is
assumed that KMC(AES-256) with chaining is available as well. For the
existing CPUs this is true but the architecturally correct way is to
check each CPACF functions on its own. This is what the query function
of each instructions is all about.

Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:09 +02:00
Martin Schwidefsky
d863d5945f s390/crypto: simplify init / exit functions
The aes and the des module register multiple crypto algorithms
dependent on the availability of specific CPACF instructions.
To simplify the deregistration with crypto_unregister_alg add
an array with pointers to the successfully registered algorithms
and use it for the error handling in the init function and in
the module exit function.

Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:08 +02:00
Martin Schwidefsky
0177db01ad s390/crypto: simplify return code handling
The CPACF instructions can complete with three different condition codes:
CC=0 for successful completion, CC=1 if the protected key verification
failed, and CC=3 for partial completion.

The inline functions will restart the CPACF instruction for partial
completion, this removes the CC=3 case. The CC=1 case is only relevant
for the protected key functions of the KM, KMC, KMAC and KMCTR
instructions. As the protected key functions are not used by the
current code, there is no need for any kind of return code handling.

Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:07 +02:00
Martin Schwidefsky
edc63a3785 s390/crypto: cleanup cpacf function codes
Use a separate define for the decryption modifier bit instead of
duplicating the function codes for encryption / decrypton.
In addition use an unsigned type for the function code.

Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:05 +02:00
Martin Schwidefsky
474fd6e80f RAID/s390: add SIMD implementation for raid6 gen/xor
Using vector registers is slightly faster:

raid6: vx128x8  gen() 19705 MB/s
raid6: vx128x8  xor() 11886 MB/s
raid6: using algorithm vx128x8 gen() 19705 MB/s
raid6: .... xor() 11886 MB/s, rmw enabled

vs the software algorithms:

raid6: int64x1  gen()  3018 MB/s
raid6: int64x1  xor()  1429 MB/s
raid6: int64x2  gen()  4661 MB/s
raid6: int64x2  xor()  3143 MB/s
raid6: int64x4  gen()  5392 MB/s
raid6: int64x4  xor()  3509 MB/s
raid6: int64x8  gen()  4441 MB/s
raid6: int64x8  xor()  3207 MB/s
raid6: using algorithm int64x4 gen() 5392 MB/s
raid6: .... xor() 3509 MB/s, rmw enabled

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:04 +02:00
Martin Schwidefsky
8f149ea6e9 s390/nmi: improve revalidation of fpu / vector registers
The machine check handler will do one of two things if the floating-point
control, a floating point register or a vector register can not be
revalidated:
1) if the PSW indicates user mode the process is terminated
2) if the PSW indicates kernel mode the system is stopped

To unconditionally stop the system for 2) is incorrect.

There are three possible outcomes if the floating-point control, a
floating point register or a vector registers can not be revalidated:
1) The kernel is inside a kernel_fpu_begin/kernel_fpu_end block and
   needs the register. The system is stopped.
2) No active kernel_fpu_begin/kernel_fpu_end block and the CIF_CPU bit
   is not set. The user space process needs the register and is killed.
3) No active kernel_fpu_begin/kernel_fpu_end block and the CIF_FPU bit
   is set. Neither the kernel nor the user space process needs the
   lost register. Just revalidate it and continue.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:03 +02:00
Martin Schwidefsky
7f79695cc1 s390/fpu: improve kernel_fpu_[begin|end]
In case of nested user of the FPU or vector registers in the kernel
the current code uses the mask of the FPU/vector registers of the
previous contexts to decide which registers to save and restore.
E.g. if the previous context used KERNEL_VXR_V0V7 and the next
context wants to use KERNEL_VXR_V24V31 the first 8 vector registers
are stored to the FPU state structure. But this is not necessary
as the next context does not use these registers.

Rework the FPU/vector register save and restore code. The new code
does a few things differently:
1) A lowcore field is used instead of a per-cpu variable.
2) The kernel_fpu_end function now has two parameters just like
   kernel_fpu_begin. The register flags are required by both
   functions to save / restore the minimal register set.
3) The inline functions kernel_fpu_begin/kernel_fpu_end now do the
   update of the register masks. If the user space FPU registers
   have already been stored neither save_fpu_regs nor the
   __kernel_fpu_begin/__kernel_fpu_end functions have to be called
   for the first context. In this case kernel_fpu_begin adds 7
   instructions and kernel_fpu_end adds 4 instructions.
3) The inline assemblies in __kernel_fpu_begin / __kernel_fpu_end
   to save / restore the vector registers are simplified a bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:05:01 +02:00
Martin Schwidefsky
0eab11c7e0 s390/vx: allow to include vx-insn.h with .include
To make the vx-insn.h more versatile avoid cpp preprocessor macros
and allow to use plain numbers for vector and general purpose register
operands. With that you can emit an .include from a C file into the
assembler text and then use the vx-insn macros in inline assemblies.

For example:

asm (".include \"asm/vx-insn.h\"");

static inline void xor_vec(int x, int y, int z)
{
	asm volatile("VX %0,%1,%2"
		     : : "i" (x), "i" (y), "i" (z));
}

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:04:59 +02:00
David Hildenbrand
67f03de5f0 s390/time: avoid races when updating tb_update_count
The increment might not be atomic and we're not holding the
timekeeper_lock. Therefore we might lose an update to count, resulting in
VDSO being trapped in a loop. As other archs also simply update the
values and count doesn't seem to have an impact on reloading of these
values in VDSO code, let's just remove the update of tb_update_count.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:04:58 +02:00
David Hildenbrand
0c00b1e00b s390/time: fixup the clock comparator on all cpus
By leaving fixup_cc unset, only the clock comparator of the cpu actually
doing the sync is fixed up until now.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:04:56 +02:00
David Hildenbrand
ca64f63901 s390/time: cleanup etr leftovers
There are still some etr leftovers and wrong comments, let's clean that up.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:04:54 +02:00
David Hildenbrand
41ad022039 s390/time: simplify stp time syncs
The way we call do_adjtimex() today is broken. It has 0 effect, as
ADJ_OFFSET_SINGLESHOT (0x0001) in the kernel maps to !ADJ_ADJTIME
(in contrast to user space where it maps to  ADJ_OFFSET_SINGLESHOT |
ADJ_ADJTIME - 0x8001). !ADJ_ADJTIME will silently ignore all adjustments
without STA_PLL being active. We could switch to ADJ_ADJTIME or turn
STA_PLL on, but still we would run into some problems:

- Even when switching to nanoseconds, we lose accuracy.
- Successive calls to do_adjtimex() will simply overwrite any leftovers
  from the previous call (if not fully handled)
- Anything that NTP does using the sysctl heavily interferes with our
  use.
- !ADJ_ADJTIME will silently round stuff > or < than 0.5 seconds

Reusing do_adjtimex() here just feels wrong. The whole STP synchronization
works right now *somehow* only, as do_adjtimex() does nothing and our
TOD clock jumps in time, although it shouldn't. This is especially bad
as the clock could jump backwards in time. We will have to find another
way to fix this up.

As leap seconds are also not properly handled yet, let's just get rid of
all this complex logic altogether and use the correct clock_delta for
fixing up the clock comparator and keeping the sched_clock monotonic.

This change should have 0 effect on the current STP mechanism. Once we
know how to best handle sync events and leap second updates, we'll start
with a fresh implementation.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-29 11:04:52 +02:00
Masahiro Yamada
a5ff1b34e1 treewide: replace config_enabled() with IS_ENABLED() (2nd round)
Commit 97f2645f35 ("tree-wide: replace config_enabled() with
IS_ENABLED()") mostly killed config_enabled(), but some new users have
appeared for v4.8-rc1.  They are all used for a boolean option, so can
be replaced with IS_ENABLED() safely.

Link: http://lkml.kernel.org/r/1471970749-24867-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-26 17:39:35 -07:00
Martin Schwidefsky
88bf46bfde Merge branch 's390forkvm' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
Pull facility mask patch from the KVM tree.

* tag 's390forkvm' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
  KVM: s390: generate facility mask from readable list
2016-08-26 16:34:19 +02:00
Heiko Carstens
f6c1d359be KVM: s390: generate facility mask from readable list
Automatically generate the KVM facility mask out of a readable list.
Manually changing the masks is very error prone, especially if the
special IBM bit numbering has to be considered.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-08-25 22:47:03 +02:00
David Hildenbrand
a7d4b8f256 KVM: s390: don't use current->thread.fpu.* when accessing registers
As the meaning of these variables and pointers seems to change more
frequently, let's directly access our save area, instead of going via
current->thread.

Right now, this is broken for set/get_fpu. They simply overwrite the
host registers, as the pointers to the current save area were turned
into the static host save area.

Cc: stable@vger.kernel.org # 4.7
Fixes: 3f6813b9a5 ("s390/fpu: allocate 'struct fpu' with the task_struct")
Reported-by: Hao QingFeng <haoqf@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-08-25 17:33:24 +02:00
Josh Poimboeuf
9a7c348ba6 ftrace: Add return address pointer to ftrace_ret_stack
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.  Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Martin Schwidefsky
bd3a172557 s390/pci: add zpci_report_error interface
The 'report_error' interface for PCI devices found on s390 can be
used by a user space program to inject an adapter error notification.
Add a new kernel interface zpci_report_error to allow a PCI device
driver to inject these error notifications without a detour over
user space.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24 09:23:56 +02:00
Martin Schwidefsky
47e4d851c5 s390/mm: merge local / non-local IDTE helper
Merge the __p[m|u]xdp_idte and __p[m|u]dp_idte_local functions into a
single __p[m|u]dp_idte function with an additional parameter.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24 09:23:56 +02:00
Martin Schwidefsky
34eeaf376d s390/mm: merge local / non-local IPTE helper
Merge the __ptep_ipte and __ptep_ipte_local functions into a single
__ptep_ipte function with an additional parameter. The __pte_ipte_range
function is still extra as the while loops makes it hard to merge.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24 09:23:55 +02:00
Martin Schwidefsky
44b6cc8130 s390/mm,kvm: flush gmap address space with IDTE
The __tlb_flush_mm() helper uses a global flush if the mm struct
has a gmap structure attached to it. Replace the global flush with
two individual flushes by means of the IDTE instruction if only a
single gmap is attached the the mm.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24 09:23:55 +02:00
Martin Schwidefsky
d5dcafee5f s390/mm: no local TLB flush for clearing-by-ASCE IDTE
The local-clearing control of the IDTE instruction does not have any effect
for the clearing-by-ASCE operation. Only the invalidation-and-clearing
operation respects the local-clearing bit.

Remove __tlb_flush_idte_local and simplify the batched TLB flushing code.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-24 09:23:54 +02:00
Linus Torvalds
45b6ae761e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "A couple of bug fixes, minor cleanup and a change to the default
  config"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/dasd: fix failing CUIR assignment under LPAR
  s390/pageattr: handle numpages parameter correctly
  s390/dasd: fix hanging device after clear subchannel
  s390/qdio: avoid reschedule of outbound tasklet once killed
  s390/qdio: remove checks for ccw device internal state
  s390/qdio: fix double return code evaluation
  s390/qdio: get rid of spin_lock_irqsave usage
  s390/cio: remove subchannel_id from ccw_device_private
  s390/qdio: obtain subchannel_id via ccw_device_get_schid()
  s390/cio: stop using subchannel_id from ccw_device_private
  s390/config: make the vector optimized crc function builtin
  s390/lib: fix memcmp and strstr
  s390/crc32-vx: Fix checksum calculation for small sizes
  s390: clarify compressed image code path
2016-08-16 15:50:22 -07:00
Linus Torvalds
329f415291 KVM locks kvm_device list to prevent corruption on device creation.
PPC splits debugfs initialization from creation of the xics device to
 unlock the newly taken kvm lock earlier.
 
 s390 prevents userspace from triggering two WARN_ON_ONCE.
 
 MIPS fixes several issues in the management of TLB faults (Cc: stable).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJXrx2ZAAoJEED/6hsPKofoo/4H/jra5NNxvpo09LWlXTwGXxBH
 cwcfDZSiOFxgvWztKJOIjPI4ETL3mnZvb9SFWBZZh1U0kfZ/TGiWouwaDNlBkPYj
 I3YHuPI7if+yUOmJlI3N2hWa0Wo0qiMqIjKT0pQVSLLdK/CVE+xGyS+qtXTNXHQn
 pFdKlYr//7OwQEY0ow1yj5VnsFrXB1JWFyB/+N5zaCfbCaQVyZAL7rj8SUbC/32W
 CiNhrvatzierKIfPerWw8DvvBKhCgWaRuLl0W+uMncrC9Qepcx9moM2beD1txK2I
 iHor1TDxUPifGQONfWMAlw87FluzHF4vQ5nN2jyTi8TT+CEfZpZ43Q+DY7okD4w=
 =NQP9
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "KVM:
   - lock kvm_device list to prevent corruption on device creation.

  PPC:
   - split debugfs initialization from creation of the xics device to
     unlock the newly taken kvm lock earlier.

  s390:
   - prevent userspace from triggering two WARN_ON_ONCE.

  MIPS:
   - fix several issues in the management of TLB faults (Cc: stable)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  MIPS: KVM: Propagate kseg0/mapped tlb fault errors
  MIPS: KVM: Fix gfn range check in kseg0 tlb faults
  MIPS: KVM: Add missing gfn range check
  MIPS: KVM: Fix mapped fault broken commpage handling
  KVM: Protect device ops->create and list_add with kvm->lock
  KVM: PPC: Move xics_debugfs_init out of create
  KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
  KVM: s390: set the prefix initially properly
2016-08-13 10:11:14 -07:00
Julius Niedworok
aca411a4b1 KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
When triggering KVM_RUN without a user memory region being mapped
(KVM_SET_USER_MEMORY_REGION) a validity intercept occurs. This could
happen, if the user memory region was not mapped initially or if it
was unmapped after the vcpu is initialized. The function
kvm_s390_handle_requests checks for the KVM_REQ_MMU_RELOAD bit. The
check function always clears this bit. If gmap_mprotect_notify
returns an error code, the mapping failed, but the KVM_REQ_MMU_RELOAD
was not set anymore. So the next time kvm_s390_handle_requests is
called, the execution would fall trough the check for
KVM_REQ_MMU_RELOAD. The bit needs to be resetted, if
gmap_mprotect_notify returns an error code. Resetting the bit with
kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu) fixes the bug.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Julius Niedworok <jniedwor@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-08-12 09:11:08 +02:00
Julius Niedworok
75a4615c95 KVM: s390: set the prefix initially properly
When KVM_RUN is triggered on a VCPU without an initial reset, a
validity intercept occurs.
Setting the prefix will set the KVM_REQ_MMU_RELOAD bit initially,
thus preventing the bug.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Julius Niedworok <jniedwor@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-08-12 09:10:17 +02:00
Linus Torvalds
6da7e95326 virtio/vhost: fixes and cleanups for 4.8
- Misc fixes and cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXq0ruAAoJECgfDbjSjVRp5P8H/2OlDJdSS1l+TwOXbY95ntQ1
 vxUX4vGCX5IujC+Rbt7sQV2prE3b6IktFNagpbRoWn21JkpoDMvPtYJrn5BhLtoh
 fvDkZE6Wo3QztFSjaUBZWEABBt03KPX0yrAIZplu8ne/Z8KAT3zK57BPnKfmxwv+
 dpxt+1wlnqAvYsoUUQZBFT4Gmk2oDiTofiIbQq7W9W/fooznLtLB+ArYtdfNJizC
 JnI/vJuWceEXfjT26HexCRhA2OZskrA4ZadDhOjAqkTPN5DHfweLDuHh7IsVfDd1
 wXqjc4ks3cYG0CloJ2qY2K7RpDOFIxIizixeDIuAbn9aX4sPOYYfqRm+4iRwmqQ=
 =9aUO
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
 "Misc fixes and cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio/s390: deprecate old transport
  virtio/s390: keep early_put_chars
  virtio_blk: Fix a slient kernel panic
  virtio-vsock: fix include guard typo
  vhost/vsock: fix vhost virtio_vsock_pkt use-after-free
  9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc()
  virtio: fix error handling for debug builds
  virtio: fix memory leak in virtqueue_add()
2016-08-11 14:10:23 -07:00
Heiko Carstens
4d81aaa53c s390/pageattr: handle numpages parameter correctly
Both set_memory_ro() and set_memory_rw() will modify the page
attributes of at least one page, even if the numpages parameter is
zero.

The author expected that calling these functions with numpages == zero
would never happen. However with the new 444d13ff10 ("modules: add
ro_after_init support") feature this happens frequently.

Therefore do the right thing and make these two functions return
gracefully if nothing should be done.

Fixes crashes on module load like this one:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 000003ff80008000 TEID: 000003ff80008407
Fault in home space mode while using kernel ASCE.
AS:0000000000d18007 R3:00000001e6aa4007 S:00000001e6a10800 P:00000001e34ee21d
Oops: 0004 ilc:3 [#1] SMP
Modules linked in: x_tables
CPU: 10 PID: 1 Comm: systemd Not tainted 4.7.0-11895-g3fa9045 #4
Hardware name: IBM              2964 N96              703              (LPAR)
task: 00000001e9118000 task.stack: 00000001e9120000
Krnl PSW : 0704e00180000000 00000000005677f8 (rb_erase+0xf0/0x4d0)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 000003ff80008b20 000003ff80008b20 000003ff80008b70 0000000000b9d608
           000003ff80008b20 0000000000000000 00000001e9123e88 000003ff80008950
           00000001e485ab40 000003ff00000000 000003ff80008b00 00000001e4858480
           0000000100000000 000003ff80008b68 00000000001d5998 00000001e9123c28
Krnl Code: 00000000005677e8: ec1801c3007c        cgij    %r1,0,8,567b6e
           00000000005677ee: e32010100020        cg      %r2,16(%r1)
          #00000000005677f4: a78401c2            brc     8,567b78
          >00000000005677f8: e35010080024        stg     %r5,8(%r1)
           00000000005677fe: ec5801af007c        cgij    %r5,0,8,567b5c
           0000000000567804: e30050000024        stg     %r0,0(%r5)
           000000000056780a: ebacf0680004        lmg     %r10,%r12,104(%r15)
           0000000000567810: 07fe                bcr     15,%r14
Call Trace:
([<000003ff80008900>] __this_module+0x0/0xffffffffffffd700 [x_tables])
([<0000000000264fd4>] do_init_module+0x12c/0x220)
([<00000000001da14a>] load_module+0x24e2/0x2b10)
([<00000000001da976>] SyS_finit_module+0xbe/0xd8)
([<0000000000803b26>] system_call+0xd6/0x264)
Last Breaking-Event-Address:
 [<000000000056771a>] rb_erase+0x12/0x4d0
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: e8a97e42dc ("s390/pageattr: allow kernel page table splitting")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-10 10:12:19 +02:00
Cornelia Huck
3b2fbb3f06 virtio/s390: deprecate old transport
There only ever have been two host implementations of the old
s390-virtio (pre-ccw) transport: the experimental kuli userspace,
and qemu. As qemu switched its default to ccw with 2.4 (with most
users having used ccw well before that) and removed the old transport
entirely in 2.6, s390-virtio probably hasn't been in active use for
quite some time and is therefore likely to bitrot.

Let's start the slow march towards removing the code by deprecating
it.

Note that this also deprecates the early virtio console code, which
has been causing trouble in the guest without being wired up in any
relevant hypervisor code.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-09 13:42:41 +03:00
Linus Torvalds
1eccfa090e Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user
bounds checking for most architectures on SLAB and SLUB.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJXl9tlAAoJEIly9N/cbcAm5BoP/ikTtDp2bFw1sn92yHTnIWzl
 O+dcKVAeRgjfnSvPfb1JITpaM58exQSaDsPBeR0DbVzU1zDdhLcwHHiQupFh98Ka
 vBZthbrlL/u4NB26enEEW0iyA32BsxYBMnIu0z5ux9RbZflmQwGQ0c0rvy3dJ7/b
 FzB5ayVST5y/a0m6/sImeeExh78GU9rsMb1XmJRMwlJAy6miDz/F9TP0LnuW6PhG
 J5XC99ygNJS1pQBLACRsrZw6ImgBxXnWCok6tWPMxFfD+rJBU2//wqS+HozyMWHL
 iYP7+ytVo/ZVok4114X/V4Oof3a6wqgpBuYrivJ228QO+UsLYbYLo6sZ8kRK7VFm
 9GgHo/8rWB1T9lBbSaa7UL5r0dVNNLjFGS42vwV+YlgUMQ1A35VRojO0jUnJSIQU
 Ug1IxKmylLd0nEcwD8/l3DXeQABsfL8GsoKW0OtdTZtW4RND4gzq34LK6t7hvayF
 kUkLg1OLNdUJwOi16M/rhugwYFZIMfoxQtjkRXKWN4RZ2QgSHnx2lhqNmRGPAXBG
 uy21wlzUTfLTqTpoeOyHzJwyF2qf2y4nsziBMhvmlrUvIzW1LIrYUKCNT4HR8Sh5
 lC2WMGYuIqaiu+NOF3v6CgvKd9UW+mxMRyPEybH8mEgfm+FLZlWABiBjIUpSEZuB
 JFfuMv1zlljj/okIQRg8
 =USIR
 -----END PGP SIGNATURE-----

Merge tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull usercopy protection from Kees Cook:
 "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
  copy_from_user bounds checking for most architectures on SLAB and
  SLUB"

* tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: SLUB hardened usercopy support
  mm: SLAB hardened usercopy support
  s390/uaccess: Enable hardened usercopy
  sparc/uaccess: Enable hardened usercopy
  powerpc/uaccess: Enable hardened usercopy
  ia64/uaccess: Enable hardened usercopy
  arm64/uaccess: Enable hardened usercopy
  ARM: uaccess: Enable hardened usercopy
  x86/uaccess: Enable hardened usercopy
  mm: Hardened usercopy
  mm: Implement stack frame object validation
  mm: Add is_migrate_cma_page
2016-08-08 14:48:14 -07:00
Christian Borntraeger
6e127efeea s390/config: make the vector optimized crc function builtin
For all configs with CONFIG_BTRFS_FS = y we should also make the
optimized crc module builtin. Otherwise early mounts will fall
back to the software variant.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-08 15:41:32 +02:00
Christian Borntraeger
e2efc42454 s390/lib: fix memcmp and strstr
if two string compare equal the clcle instruction will update the
string addresses to point _after_ the string. This might already
be on a different page, so we should not use these pointer to
calculate the difference as in that case the calculation of the
difference can cause oopses.

The return value of memcmp does not need the difference, we
can just reuse the condition code and return for CC=1 (All bytes
compared, first operand low) -1 and for CC=2 (All bytes compared,
first operand high) +1
strstr also does not need the diff.
While fixing this, make the common function clcle "correct on its
own" by using l1 instead of l2 for the first length. strstr will
call this with l2 for both strings.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: db7f5eef3d ("s390/lib: use basic blocks for inline assemblies")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-08 15:41:31 +02:00
Christian Borntraeger
134a24cd89 s390/crc32-vx: Fix checksum calculation for small sizes
The current prealign logic will fail for sizes < alignment,
as the new datalen passed to the vector function is smaller
than zero. Being a size_t this gets wrapped to a huge
number causing memory overruns and wrong data.

Let's add an early exit if the size is smaller than the minimal
size with alignment. This will also avoid calling the software
fallback twice for all sizes smaller than the minimum size
(prealign + remaining)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: f848dbd3bc ("s390/crc32-vx: add crypto API module for optimized CRC-32 algorithms")
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-08 15:41:31 +02:00
Sascha Silbe
bf47dc572a s390: clarify compressed image code path
The way the decompressor is hooked into the start-up code is rather
subtle, with a mix of multiply-defined symbols and hardcoded address
literals. Add some comments at the junction points to clarify how it
works.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-08 15:41:31 +02:00
Al Viro
711f5df7bf s390: move exports to definitions
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-07 23:47:20 -04:00
Krzysztof Kozlowski
00085f1efa dma-mapping: use unsigned long for dma_attrs
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Linus Torvalds
f716a85cd6 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
   Kees Cook.  The plugins are meant to be used for static analysis of
   the kernel code.  Two plugins are provided already.

 - reduction of the gcc commandline by Arnd Bergmann.

 - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

 - bin2c fix by Michael Tautschnig

 - setlocalversion fix by Wolfram Sang

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  gcc-plugins: disable under COMPILE_TEST
  kbuild: Abort build on bad stack protector flag
  scripts: Fix size mismatch of kexec_purgatory_size
  kbuild: make samples depend on headers_install
  Kbuild: don't add obj tree in additional includes
  Kbuild: arch: look for generated headers in obtree
  Kbuild: always prefix objtree in LINUXINCLUDE
  Kbuild: avoid duplicate include path
  Kbuild: don't add ../../ to include path
  vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
  kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
  kconfig.h: use already defined macros for IS_REACHABLE() define
  export.h: use __is_defined() to check if __KSYM_* is defined
  kconfig.h: use __is_defined() to check if MODULE is defined
  kbuild: setlocalversion: print error to STDERR
  Add sancov plugin
  Add Cyclomatic complexity GCC plugin
  GCC plugin infrastructure
  Shared library support
2016-08-02 16:37:12 -04:00
Linus Torvalds
221bb8a46e - ARM: GICv3 ITS emulation and various fixes. Removal of the old
VGIC implementation.
 
 - s390: support for trapping software breakpoints, nested virtualization
 (vSIE), the STHYI opcode, initial extensions for CPU model support.
 
 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
 preliminary to this and the upcoming support for hardware virtualization
 extensions.
 
 - x86: support for execute-only mappings in nested EPT; reduced vmexit
 latency for TSC deadline timer (by about 30%) on Intel hosts; support for
 more than 255 vCPUs.
 
 - PPC: bugfixes.
 
 The ugly bit is the conflicts.  A couple of them are simple conflicts due
 to 4.7 fixes, but most of them are with other trees. There was definitely
 too much reliance on Acked-by here.  Some conflicts are for KVM patches
 where _I_ gave my Acked-by, but the worst are for this pull request's
 patches that touch files outside arch/*/kvm.  KVM submaintainers should
 probably learn to synchronize better with arch maintainers, with the
 latter providing topic branches whenever possible instead of Acked-by.
 This is what we do with arch/x86.  And I should learn to refuse pull
 requests when linux-next sends scary signals, even if that means that
 submaintainers have to rebase their branches.
 
 Anyhow, here's the list:
 
 - arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
 by the nvdimm tree.  This tree adds handle_preemption_timer and
 EXIT_REASON_PREEMPTION_TIMER at the same place.  In general all mentions
 of pcommit have to go.
 
 There is also a conflict between a stable fix and this patch, where the
 stable fix removed the vmx_create_pml_buffer function and its call.
 
 - virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
 This tree adds kvm_io_bus_get_dev at the same place.
 
 - virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
 file was completely removed for 4.8.
 
 - include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
 this is a change that should have gone in through the irqchip tree and
 pulled by kvm-arm.  I think I would have rejected this kvm-arm pull
 request.  The KVM version is the right one, except that it lacks
 GITS_BASER_PAGES_SHIFT.
 
 - arch/powerpc: what a mess.  For the idle_book3s.S conflict, the KVM
 tree is the right one; everything else is trivial.  In this case I am
 not quite sure what went wrong.  The commit that is causing the mess
 (fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
 path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
 and arch/powerpc/kvm/.  It's large, but at 396 insertions/5 deletions
 I guessed that it wasn't really possible to split it and that the 5
 deletions wouldn't conflict.  That wasn't the case.
 
 - arch/s390: also messy.  First is hypfs_diag.c where the KVM tree
 moved some code and the s390 tree patched it.  You have to reapply the
 relevant part of commits 6c22c98637, plus all of e030c1125e, to
 arch/s390/kernel/diag.c.  Or pick the linux-next conflict
 resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
 Second, there is a conflict in gmap.c between a stable fix and 4.8.
 The KVM version here is the correct one.
 
 I have pushed my resolution at refs/heads/merge-20160802 (commit
 3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
 SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
 U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
 x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
 qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
 69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
 =42ti
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:

 - ARM: GICv3 ITS emulation and various fixes.  Removal of the
   old VGIC implementation.

 - s390: support for trapping software breakpoints, nested
   virtualization (vSIE), the STHYI opcode, initial extensions
   for CPU model support.

 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
   of cleanups, preliminary to this and the upcoming support for
   hardware virtualization extensions.

 - x86: support for execute-only mappings in nested EPT; reduced
   vmexit latency for TSC deadline timer (by about 30%) on Intel
   hosts; support for more than 255 vCPUs.

 - PPC: bugfixes.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
  KVM: PPC: Introduce KVM_CAP_PPC_HTM
  MIPS: Select HAVE_KVM for MIPS64_R{2,6}
  MIPS: KVM: Reset CP0_PageMask during host TLB flush
  MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
  MIPS: KVM: Sign extend MFC0/RDHWR results
  MIPS: KVM: Fix 64-bit big endian dynamic translation
  MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
  MIPS: KVM: Use 64-bit CP0_EBase when appropriate
  MIPS: KVM: Set CP0_Status.KX on MIPS64
  MIPS: KVM: Make entry code MIPS64 friendly
  MIPS: KVM: Use kmap instead of CKSEG0ADDR()
  MIPS: KVM: Use virt_to_phys() to get commpage PFN
  MIPS: Fix definition of KSEGX() for 64-bit
  KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
  kvm: x86: nVMX: maintain internal copy of current VMCS
  KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
  KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
  KVM: arm64: vgic-its: Simplify MAPI error handling
  KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
  KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
  ...
2016-08-02 16:11:27 -04:00
Jiri Olsa
e64a5470dc s390/ftrace/jprobes: Fix conflict between jprobes and function graph tracing
This fixes the same issue Steven already fixed for x86
in following commit:

  237d28db03 ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing

It fixes the crash, that happens when function graph tracing
and jprobes are used simultaneously. Please refer to above
commit for details.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-31 09:28:12 -04:00
James Hogan
68c5cf5a60 s390: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for s390 at all even though ARCH_DLINFO can contain one NEW_AUX_ENT when
VDSO is enabled.

This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which s390 doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.

Fixes: b020632e40 ("[S390] introduce vdso on s390")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
2016-07-31 09:28:09 -04:00
Heiko Carstens
ef4423ce70 s390/numa: only set possible nodes within node_possible_map
Make sure that only those nodes appear in the node_possible_map that
may actually be used. Usually that means that the node online and
possible maps are identical. For mode "plain" we only have one node,
for mode "emu" we have "emu_nodes" nodes.

Before this the possible map included (with default config) 16 nodes
while usually only one was used. That made a couple of loops that
iterated over all possible nodes do more work than necessary.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:28:00 -04:00
Heiko Carstens
8814309163 s390/als: fix compile with gcov enabled
Fix this one when gcov is enabled:

arch/s390/kernel/als.o:(.data+0x118): undefined reference to `__gcov_merge_add'
arch/s390/kernel/als.o: In function `_GLOBAL__sub_I_65535_0_verify_facilities':
(.text.startup+0x8): undefined reference to `__gcov_init'

Please merge with "s390/als: convert architecture level set code to C".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:28:00 -04:00
Heiko Carstens
fbd6534ce0 s390/facilities: do not generate DWORDS define anymore
The architecture level set code has been converted to C and doesn't
need a define to figure out array sizes. Since the old code was the
only user of the DWORDS define, we can get rid of it again.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:59 -04:00
Heiko Carstens
a02d1988b5 s390/als: print missing facilities on facility mismatch
If the kernel needs more facilities to run than the machine provides
it is running on, print the facility bit numbers which are missing.

This allows to easily tell what went wrong and if simply the machine
does not provide a required facility or if either the kernel or the
hypervisor may have a bug.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:59 -04:00
Heiko Carstens
06ed5512a2 s390/als: print machine type on facility mismatch
If we have a facility mismatch the kernel only emits a warning that
the processor is not recent enough and stops operating. This doesn't
give us a lot of an idea of what actually went wrong.

As a first step print the machine type in addition.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:59 -04:00
Heiko Carstens
be2412c247 s390/als: convert architecture level set code to C
There is no reason to have this code in assembly language. Therefore
convert it to C.

Note that this code needs special treatment: it is called very early
and one of the side effects is that e.g. the bss section is not
cleared. Therefore the preferred way for static variables is to put
them on the stack which has a size of 16KB.

There is no functional change with this patch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:58 -04:00
Heiko Carstens
cf9fdfea5f s390/sclp: move uninitialized data to data section
The early sclp code may be called before the bss section is
cleared. Therefore move all variables to the data section.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:58 -04:00
Gerald Schaefer
bc29b7ac1d s390/mm: clean up pte/pmd encoding
The hugetlbfs pte<->pmd conversion functions currently assume that the pmd
bit layout is consistent with the pte layout, which is not really true.

The SW read and write bits are encoded as the sequence "wr" in a pte, but
in a pmd it is "rw". The hugetlbfs conversion assumes that the sequence
is identical in both cases, which results in swapped read and write bits
in the pmd. In practice this is not a problem, because those pmd bits are
only relevant for THP pmds and not for hugetlbfs pmds. The hugetlbfs code
works on (fake) ptes, and the converted pte bits are correct.

There is another variation in pte/pmd encoding which affects dirty
prot-none ptes/pmds. In this case, a pmd has both its HW read-only and
invalid bit set, while it is only the invalid bit for a pte. This also has
no effect in practice, but it should better be consistent.

This patch fixes both inconsistencies by changing the SW read/write bit
layout for pmds as well as the PAGE_NONE encoding for ptes. It also makes
the hugetlbfs conversion functions more robust by introducing a
move_set_bit() macro that uses the pte/pmd bit #defines instead of
constant shifts.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:57 -04:00
Linus Torvalds
797cee982e Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Six audit patches for 4.8.

  There are a couple of style and minor whitespace tweaks for the logs,
  as well as a minor fixup to catch errors on user filter rules, however
  the major improvements are a fix to the s390 syscall argument masking
  code (reviewed by the nice s390 folks), some consolidation around the
  exclude filtering (less code, always a win), and a double-fetch fix
  for recording the execve arguments"

* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
  audit: fix a double fetch in audit_log_single_execve_arg()
  audit: fix whitespace in CWD record
  audit: add fields to exclude filter by reusing user filter
  s390: ensure that syscall arguments are properly masked on s390
  audit: fix some horrible switch statement style crimes
  audit: fixup: log on errors from filter user rules
2016-07-29 17:54:17 -07:00
Linus Torvalds
7a1e8b80fb Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - TPM core and driver updates/fixes
   - IPv6 security labeling (CALIPSO)
   - Lots of Apparmor fixes
   - Seccomp: remove 2-phase API, close hole where ptrace can change
     syscall #"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
  apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
  tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
  tpm: Factor out common startup code
  tpm: use devm_add_action_or_reset
  tpm2_i2c_nuvoton: add irq validity check
  tpm: read burstcount from TPM_STS in one 32-bit transaction
  tpm: fix byte-order for the value read by tpm2_get_tpm_pt
  tpm_tis_core: convert max timeouts from msec to jiffies
  apparmor: fix arg_size computation for when setprocattr is null terminated
  apparmor: fix oops, validate buffer size in apparmor_setprocattr()
  apparmor: do not expose kernel stack
  apparmor: fix module parameters can be changed after policy is locked
  apparmor: fix oops in profile_unpack() when policy_db is not present
  apparmor: don't check for vmalloc_addr if kvzalloc() failed
  apparmor: add missing id bounds check on dfa verification
  apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
  apparmor: use list_next_entry instead of list_entry_next
  apparmor: fix refcount race when finding a child profile
  apparmor: fix ref count leak when profile sha1 hash is read
  apparmor: check that xindex is in trans_table bounds
  ...
2016-07-29 17:38:46 -07:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Mel Gorman
11fb998986 mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Linus Torvalds
468fc7ed55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

 2) Make DSA binding more sane, from Andrew Lunn.

 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface.  From Brenden Blanco and
    others.

 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

 8) Simplify netlink conntrack entry layout, from Florian Westphal.

 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  xgene: Fix build warning with ACPI disabled.
  be2net: perform temperature query in adapter regardless of its interface state
  l2tp: Correctly return -EBADF from pppol2tp_getname.
  net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
  net: ipmr/ip6mr: update lastuse on entry change
  macsec: ensure rx_sa is set when validation is disabled
  tipc: dump monitor attributes
  tipc: add a function to get the bearer name
  tipc: get monitor threshold for the cluster
  tipc: make cluster size threshold for monitoring configurable
  tipc: introduce constants for tipc address validation
  net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
  MAINTAINERS: xgene: Add driver and documentation path
  Documentation: dtb: xgene: Add MDIO node
  dtb: xgene: Add MDIO node
  drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
  drivers: net: xgene: Use exported functions
  drivers: net: xgene: Enable MDIO driver
  drivers: net: xgene: Add backward compatibility
  drivers: net: phy: xgene: Add MDIO driver
  ...
2016-07-27 12:03:20 -07:00
Linus Torvalds
0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00
Kirill A. Shutemov
dcddffd41d mm: do not pass mm_struct into handle_mm_fault
We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Aneesh Kumar K.V
e77b0852b5 mm/mmu_gather: track page size with mmu gather and force flush if page size change
This allows an arch which needs to do special handing with respect to
different page size when flushing tlb to implement the same in mmu
gather.

Link: http://lkml.kernel.org/r/1465049193-22197-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Aneesh Kumar K.V
e9d55e1570 mm: change the interface for __tlb_remove_page()
This updates the generic and arch specific implementation to return true
if we need to do a tlb flush.  That means if a __tlb_remove_page
indicate a flush is needed, the page we try to remove need to be tracked
and added again after the flush.  We need to track it because we have
already update the pte to none and we can't just loop back.

This change is done to enable us to do a tlb_flush when we try to flush
a range that consists of different page sizes.  For architectures like
ppc64, we can do a range based tlb flush and we need to track page size
for that.  When we try to remove a huge page, we will force a tlb flush
and starts a new mmu gather.

[aneesh.kumar@linux.vnet.ibm.com: mm-change-the-interface-for-__tlb_remove_page-v3]
  Link: http://lkml.kernel.org/r/1465049193-22197-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1464860389-29019-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Kees Cook
97433ea4fd s390/uaccess: Enable hardened usercopy
Enables CONFIG_HARDENED_USERCOPY checks on s390.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-07-26 14:41:52 -07:00
Linus Torvalds
bbce2ad2d7 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.8:

  API:
   - first part of skcipher low-level conversions
   - add KPP (Key-agreement Protocol Primitives) interface.

  Algorithms:
   - fix IPsec/cryptd reordering issues that affects aesni
   - RSA no longer does explicit leading zero removal
   - add SHA3
   - add DH
   - add ECDH
   - improve DRBG performance by not doing CTR by hand

  Drivers:
   - add x86 AVX2 multibuffer SHA256/512
   - add POWER8 optimised crc32c
   - add xts support to vmx
   - add DH support to qat
   - add RSA support to caam
   - add Layerscape support to caam
   - add SEC1 AEAD support to talitos
   - improve performance by chaining requests in marvell/cesa
   - add support for Araneus Alea I USB RNG
   - add support for Broadcom BCM5301 RNG
   - add support for Amlogic Meson RNG
   - add support Broadcom NSP SoC RNG"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
  crypto: vmx - Fix aes_p8_xts_decrypt build failure
  crypto: vmx - Ignore generated files
  crypto: vmx - Adding support for XTS
  crypto: vmx - Adding asm subroutines for XTS
  crypto: skcipher - add comment for skcipher_alg->base
  crypto: testmgr - Print akcipher algorithm name
  crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
  crypto: nx - off by one bug in nx_of_update_msc()
  crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
  crypto: scatterwalk - Inline start/map/done
  crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
  crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
  crypto: scatterwalk - Fix test in scatterwalk_done
  crypto: api - Optimise away crypto_yield when hard preemption is on
  crypto: scatterwalk - add no-copy support to copychunks
  crypto: scatterwalk - Remove scatterwalk_bytes_sglen
  crypto: omap - Stop using crypto scatterwalk_bytes_sglen
  crypto: skcipher - Remove top-level givcipher interface
  crypto: user - Remove crypto_lookup_skcipher call
  crypto: cts - Convert to skcipher
  ...
2016-07-26 13:40:17 -07:00
Linus Torvalds
015cd867e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There are a couple of new things for s390 with this merge request:

   - a new scheduling domain "drawer" is added to reflect the unusual
     topology found on z13 machines.  Performance tests showed up to 8
     percent gain with the additional domain.

   - the new crc-32 checksum crypto module uses the vector-galois-field
     multiply and sum SIMD instruction to speed up crc-32 and crc-32c.

   - proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in
     the generic vmlinux.lds linker script definitions.

   - kcov instrumentation support.  A prerequisite for that is the
     inline assembly basic block cleanup, which is the reason for the
     net/iucv/iucv.c change.

   - support for 2GB pages is added to the hugetlbfs backend.

  Then there are two removals:

   - the oprofile hardware sampling support is dead code and is removed.
     The oprofile user space uses the perf interface nowadays.

   - the ETR clock synchronization is removed, this has been superseeded
     be the STP clock synchronization.  And it always has been
     "interesting" code..

  And the usual bug fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits)
  s390/pci: Delete an unnecessary check before the function call "pci_dev_put"
  s390/smp: clean up a condition
  s390/cio/chp : Remove deprecated create_singlethread_workqueue
  s390/chsc: improve channel path descriptor determination
  s390/chsc: sanitize fmt check for chp_desc determination
  s390/cio: make fmt1 channel path descriptor optional
  s390/chsc: fix ioctl CHSC_INFO_CU command
  s390/cio/device_ops: fix kernel doc
  s390/cio: allow to reset channel measurement block
  s390/console: Make preferred console handling more consistent
  s390/mm: fix gmap tlb flush issues
  s390/mm: add support for 2GB hugepages
  s390: have unique symbol for __switch_to address
  s390/cpuinfo: show maximum thread id
  s390/ptrace: clarify bits in the per_struct
  s390: stack address vs thread_info
  s390: remove pointless load within __switch_to
  s390: enable kcov support
  s390/cpumf: use basic block for ecctr inline assembly
  s390/hypfs: use basic block for diag inline assembly
  ...
2016-07-26 12:22:51 -07:00
Linus Torvalds
c86ad14d30 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The locking tree was busier in this cycle than the usual pattern - a
  couple of major projects happened to coincide.

  The main changes are:

   - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
     across all SMP architectures (Peter Zijlstra)

   - add atomic_fetch_{inc/dec}() as well, using the generic primitives
     (Davidlohr Bueso)

   - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
     Waiman Long)

   - optimize smp_cond_load_acquire() on arm64 and implement LSE based
     atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
     on arm64 (Will Deacon)

   - introduce smp_acquire__after_ctrl_dep() and fix various barrier
     mis-uses and bugs (Peter Zijlstra)

   - after discovering ancient spin_unlock_wait() barrier bugs in its
     implementation and usage, strengthen its semantics and update/fix
     usage sites (Peter Zijlstra)

   - optimize mutex_trylock() fastpath (Peter Zijlstra)

   - ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
  locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
  locking/static_keys: Fix non static symbol Sparse warning
  locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
  locking/atomic, arch/tile: Fix tilepro build
  locking/atomic, arch/m68k: Remove comment
  locking/atomic, arch/arc: Fix build
  locking/Documentation: Clarify limited control-dependency scope
  locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
  locking/atomic, arch/mips: Convert to _relaxed atomics
  locking/atomic, arch/alpha: Convert to _relaxed atomics
  locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
  locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
  locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
  locking/atomic: Fix atomic64_relaxed() bits
  locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  ...
2016-07-25 12:41:29 -07:00
Arnd Bergmann
58ab5e0c2c Kbuild: arch: look for generated headers in obtree
There are very few files that need add an -I$(obj) gcc for the preprocessor
or the assembler. For C files, we add always these for both the objtree and
srctree, but for the other ones we require the Makefile to add them, and
Kbuild then adds it for both trees.

As a preparation for changing the meaning of the -I$(obj) directive to
only refer to the srctree, this changes the two instances in arch/x86 to use
an explictit $(objtree) prefix where needed, otherwise we won't find the
headers any more, as reported by the kbuild 0day builder.

arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-07-18 21:31:35 +02:00
David Hildenbrand
9acc317b18 KVM: s390: let ptff intercepts result in cc=3
We don't emulate ptff subfunctions, therefore react on any attempt of
execution by setting cc=3 (Requested function not available).

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-18 14:15:00 +02:00
David Hildenbrand
6502a34cfd KVM: s390: allow user space to handle instr 0x0000
We will use illegal instruction 0x0000 for handling 2 byte sw breakpoints
from user space. As it can be enabled dynamically via a capability,
let's move setting of ICTL_OPEREXC to the post creation step, so we avoid
any races when enabling that capability just while adding new cpus.

Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-18 14:15:00 +02:00
Markus Elfring
64a40c8400 s390/pci: Delete an unnecessary check before the function call "pci_dev_put"
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-18 10:17:24 +02:00
Dan Carpenter
61282affd2 s390/smp: clean up a condition
I can never remember precedence rules.  Let's add some parenthesis so
this code is more clear.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-18 10:17:23 +02:00
Daniel Borkmann
7e3f977edd perf, events: add non-linear data support for raw records
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.

If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.

This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.

The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].

  [1] http://thread.gmane.org/gmane.linux.network/421294

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Sebastian Andrzej Siewior
e3d617fe6a s390/perf: Convert the hotplug notifier to state machine callbacks (Sampling)
Install the callbacks via the state machine and let the core invoke the
callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.518084858@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:38 +02:00
Thomas Gleixner
4f0f8217e6 s390/perf: Convert the hotplug notifier to state machine callbacks (Counter)
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.436370635@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:37 +02:00
Radim Krčmář
c63cf538eb KVM: pass struct kvm to kvm_set_routing_entry
Arch-specific code will use it.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:56 +02:00
Peter Oberparleiter
8f50af49f5 s390/console: Make preferred console handling more consistent
Use the same code structure when determining preferred consoles for
Linux running as KVM guest as with Linux running in LPAR and z/VM
guest:

 - Extend the console_mode variable to cover vt220 and hvc consoles
 - Determine sensible console defaults in conmode_default()
 - Remove KVM-special handling in set_preferred_console()

Ensure that the sclp line mode console is also registered when the
vt220 console was selected to not change existing behavior that
someone might be relying on.

As an externally visible change, KVM guest users can now select
the 3270 or 3215 console devices using the conmode= kernel parameter,
provided that support for the corresponding driver was compiled into
the kernel.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-13 10:58:07 +02:00
David Hildenbrand
f045402984 s390/mm: fix gmap tlb flush issues
__tlb_flush_asce() should never be used if multiple asce belong to a mm.

As this function changes mm logic determining if local or global tlb
flushes will be neded, we might end up flushing only the gmap asce on all
CPUs and a follow up mm asce flushes will only flush on the local CPU,
although that asce ran on multiple CPUs.

The missing tlb flushes will provoke strange faults in user space and even
low address protections in user space, crashing the kernel.

Fixes: 1b948d6cae ("s390/mm,tlb: optimize TLB flushing for zEC12")
Cc: stable@vger.kernel.org # 3.15+
Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-13 10:58:01 +02:00
Gerald Schaefer
d08de8e2d8 s390/mm: add support for 2GB hugepages
This adds support for 2GB hugetlbfs pages on s390.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-06 08:46:43 +02:00
David Hildenbrand
5ffe466cd3 KVM: s390: inject PER i-fetch events on applicable icpts
In case we have to emuluate an instruction or part of it (instruction,
partial instruction, operation exception), we have to inject a PER
instruction-fetching event for that instruction, if hardware told us to do
so.

In case we retry an instruction, we must not inject the PER event.

Please note that we don't filter the events properly yet, so guest
debugging will be visible for the guest.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-07-05 12:02:56 +02:00
Heiko Carstens
46210c440c s390: have unique symbol for __switch_to address
After linking there are several symbols for the same address that the
__switch_to symbol points to. E.g.:

000000000089b9c0 T __kprobes_text_start
000000000089b9c0 T __lock_text_end
000000000089b9c0 T __lock_text_start
000000000089b9c0 T __sched_text_end
000000000089b9c0 T __switch_to

When disassembling with "objdump -d" this results in a missing
__switch_to function. It would be named __kprobes_text_start
instead. To unconfuse objdump add a nop in front of the kprobes text
section. That way __switch_to appears again.

Obviously this solution is sort of a hack, since it also depends on
link order if this works or not. However it is the best I can come up
with for now.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-04 09:25:22 +02:00
Heiko Carstens
10f4954ae6 s390/cpuinfo: show maximum thread id
Expose the maximum thread id with /proc/cpuinfo.
With the new line the output looks like this:

vendor_id       : IBM/S390
bogomips per cpu: 20325.00
max thread id   : 1

With this new interface it is possible to always tell the correct
number of cpu threads potentially being used by the kernel.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-04 09:25:20 +02:00
Herbert Xu
64e26807bb crypto: s390/aes - Use skcipher for fallback
This patch replaces use of the obsolete blkcipher with skcipher.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-01 23:45:10 +08:00
Paolo Bonzini
6edaa5307f KVM: remove kvm_guest_enter/exit wrappers
Use the functions from context_tracking.h directly.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:21 +02:00
Martin Schwidefsky
25d539c9a9 s390/ptrace: clarify bits in the per_struct
The bits single_step and instruction_fetch lost their meaning
with git commit 5e9a26928f "[S390] ptrace cleanup".
Clarify the comment for these two bits.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 13:11:55 +02:00
Heiko Carstens
c3bdf2e11a s390: stack address vs thread_info
Avoid using the address of a process' thread_info structure as the
kernel stack address. This will break as soon as the thread_info
structure will be removed from the stack, and in addition it makes the
code a bit more understandable.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:33:00 +02:00
Heiko Carstens
43799597dc s390: remove pointless load within __switch_to
Remove a leftover from the code that transferred a couple of TIF bits
from the previous task to the next task.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:40 +02:00
Heiko Carstens
907fa061cc s390: enable kcov support
Now that hopefully all inline assemblies have been converted to single
basic blocks we can enable kcov on s390.

Note that this patch does not disable as many files on s390 like the
x86 variant does. Right now I didn't see a reason to do that, however
additional files or directories can be excluded at any time.

The runtime overhead seems to be quite high.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:39 +02:00
Heiko Carstens
e238c15e52 s390/cpumf: use basic block for ecctr inline assembly
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:38 +02:00
Heiko Carstens
e030c1125e s390/hypfs: use basic block for diag inline assembly
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:37 +02:00
Heiko Carstens
2c79813a1f s390/sysinfo: use basic block for stsi inline assembly
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:36 +02:00
Heiko Carstens
80a60f6ef1 s390/smp: use basic blocks for sigp inline assemblies
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:35 +02:00
Heiko Carstens
7b411ac6b7 s390/pci: use basic blocks for pci inline assemblies
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:32 +02:00
Heiko Carstens
931641c639 s390/mm: use basic block for essa inline assembly
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:29 +02:00
Heiko Carstens
db7f5eef3d s390/lib: use basic blocks for inline assemblies
Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:28 +02:00
Heiko Carstens
dc4aace160 s390/uaccess: fix __put_get_user_asm define
The __put_get_user_asm defines an inline assmembly which makes use of
the asm register construct. The parameters passed to that define may
also contain function calls.

It is a gcc restriction that between register asm statements and the
use of any such annotated variables function calls may clobber the
register / variable contents. Or in other words: gcc would generate
broken code.

This can be achieved e.g. with the following code:

    get_user(x, func() ? a : b);

where the call of func would clobber register zero which is used by
the __put_get_user_asm define.
To avoid this add two static inline functions which don't have these
side effects.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:27 +02:00
Heiko Carstens
109ab95471 s390/cpuinfo: rename cpu field to cpu number
The cpu field name within /proc/cpuinfo has a conflict with the
powerpc and sparc output where it contains the cpu model name.  So
rename the field name to cpu number which shouldn't generate any
conflicts.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:26 +02:00
Heiko Carstens
2764196f45 s390/perf: remove perf_release/reserver_sampling functions
Now that the oprofile sampling code is gone there is only one user of
the sampling facility left. Therefore the reserve and release
functions can be removed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:25 +02:00
Heiko Carstens
93dd49d002 s390/oprofile: remove hardware sampler support
Remove hardware sampler support from oprofile module.

The oprofile user space utilty has been switched to use the kernel
perf interface, for which we also provide hardware sampling support.

In addition the hardware sampling support is also slightly broken: it
supports only 16 bits for the pid and therefore would generate wrong
results on machines which have a pid >64k.

Also the pt_regs structure which was passed to oprofile common code
cannot necessarily be used to generate sane backtraces, since the
task(s) in question may run while the samples are fed to oprofile.
So the result would be more or less random.

However given that the only user space tools switched to the perf
interface already four years ago the hardware sampler code seems to be
unused code, and therefore it should be reasonable to remove it.

The timer based oprofile support continues to work.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Andreas Arnez <arnez@linux.vnet.ibm.com>
Acked-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Acked-by: Robert Richter <rric@kernel.org>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:22 +02:00
Martin Schwidefsky
bcf4dd5f9e s390: fix test_fp_ctl inline assembly contraints
The test_fp_ctl function is used to test if a given value is a valid
floating-point control. The inline assembly in test_fp_ctl uses an
incorrect constraint for the 'orig_fpc' variable. If the compiler
chooses the same register for 'fpc' and 'orig_fpc' the test_fp_ctl()
function always returns true. This allows user space to trigger
kernel oopses with invalid floating-point control values on the
signal stack.

This problem has been introduced with git commit 4725c86055
"s390: fix save and restore of the floating-point-control register"

Cc: stable@vger.kernel.org # v3.13+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:24:28 +02:00
Michael Holzheu
5419447e21 Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
This reverts commit 852ffd0f4e.

There are use cases where an intermediate boot kernel (1) uses kexec
to boot the final production kernel (2). For this scenario we should
provide the original boot information to the production kernel (2).
Therefore clearing the boot information during kexec() should not
be done.

Cc: stable@vger.kernel.org # v3.17+
Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:24:27 +02:00
Michal Hocko
10d58bf297 s390: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

page_table_alloc then uses the flag for a single page allocation.  This
means that this flag has never been actually useful here because it has
always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-14-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Paul Moore
da7f750c1e s390: ensure that syscall arguments are properly masked on s390
When executing s390 code on s390x the syscall arguments are not
properly masked, leading to some malformed audit records.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-22 16:40:07 -04:00
David Hildenbrand
a411edf132 KVM: s390: vsie: add module parameter "nested"
Let's be careful first and allow nested virtualization only if enabled
by the system administrator. In addition, user space still has to
explicitly enable it via SCLP features for it to work.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:47 +02:00
David Hildenbrand
5d3876a8bf KVM: s390: vsie: add indication for future features
We have certain SIE features that we cannot support for now.
Let's add these features, so user space can directly prepare to enable
them, so we don't have to update yet another component.

In addition, add a comment block, telling why it is for now not possible to
forward/enable these features.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:47 +02:00
David Hildenbrand
91473b487d KVM: s390: vsie: correctly set and handle guest TOD
Guest 2 sets up the epoch of guest 3 from his point of view. Therefore,
we have to add the guest 2 epoch to the guest 3 epoch. We also have to take
care of guest 2 epoch changes on STP syncs. This will work just fine by
also updating the guest 3 epoch when a vsie_block has been set for a VCPU.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:46 +02:00
David Hildenbrand
b917ae573f KVM: s390: vsie: speed up VCPU external calls
Whenever a SIGP external call is injected via the SIGP external call
interpretation facility, the VCPU is not kicked. When a VCPU is currently
in the VSIE, the external call might not be processed immediately.

Therefore we have to provoke partial execution exceptions, which leads to a
kick of the VCPU and therefore also kick out of VSIE. This is done by
simulating the WAIT state. This bit has no other side effects.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:46 +02:00
David Hildenbrand
94a15de8fb KVM: s390: don't use CPUSTAT_WAIT to detect if a VCPU is idle
As we want to make use of CPUSTAT_WAIT also when a VCPU is not idle but
to force interception of external calls, let's check in the bitmap instead.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:45 +02:00
David Hildenbrand
adbf16985c KVM: s390: vsie: speed up VCPU irq delivery when handling vsie
Whenever we want to wake up a VCPU (e.g. when injecting an IRQ), we
have to kick it out of vsie, so the request will be handled faster.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:44 +02:00
David Hildenbrand
1b7029bec1 KVM: s390: vsie: try to refault after a reported fault to g2
We can avoid one unneeded SIE entry after we reported a fault to g2.
Theoretically, g2 resolves the fault and we can create the shadow mapping
directly, instead of failing again when entering the SIE.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:44 +02:00
David Hildenbrand
7fd7f39daa KVM: s390: vsie: support IBS interpretation
We can easily enable ibs for guest 2, so he can use it for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:43 +02:00
David Hildenbrand
13ee3f678b KVM: s390: vsie: support conditional-external-interception
We can easily enable cei for guest 2, so he can use it for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:42 +02:00
David Hildenbrand
5630a8e82b KVM: s390: vsie: support intervention-bypass
We can easily enable intervention bypass for guest 2, so it can use it
for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:42 +02:00
David Hildenbrand
a1b7b9b286 KVM: s390: vsie: support guest-storage-limit-suppression
We can easily forward guest-storage-limit-suppression if available.

One thing to care about is keeping the prefix properly mapped when
gsls in toggled on/off or the mso changes in between. Therefore we better
remap the prefix on any mso changes just like we already do with the
prefix.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:41 +02:00
David Hildenbrand
77d18f6d47 KVM: s390: vsie: support guest-PER-enhancement
We can easily forward the guest-PER-enhancement facility to guest 2 if
available.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:40 +02:00
David Hildenbrand
0615a326e0 KVM: s390: vsie: support shared IPTE-interlock facility
As we forward the whole SCA provided by guest 2, we can directly forward
SIIF if available.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:40 +02:00
David Hildenbrand
19c439b564 KVM: s390: vsie: support 64-bit-SCAO
Let's provide the 64-bit-SCAO facility to guest 2, so he can set up a SCA
for guest 3 that has a 64 bit address. Please note that we already require
the 64 bit SCAO for our vsie implementation, in order to forward the SCA
directly (by pinning the page).

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:39 +02:00
David Hildenbrand
588438cba0 KVM: s390: vsie: support run-time-instrumentation
As soon as guest 2 is allowed to use run-time-instrumentation (indicated
via via STFLE), it can also enable it for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:39 +02:00
David Hildenbrand
c9bc1eabe5 KVM: s390: vsie: support vectory facility (SIMD)
As soon as guest 2 is allowed to use the vector facility (indicated via
STFLE), it can also enable it for guest 3. We have to take care of the
sattellite block that might be used when not relying on lazy vector
copying (not the case for KVM).

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:38 +02:00
David Hildenbrand
166ecb3d3c KVM: s390: vsie: support transactional execution
As soon as guest 2 is allowed to use transactional execution (indicated via
STFLE), he can also enable it for guest 3.

Active transactional execution requires also the second prefix page to be
mapped. If that page cannot be mapped, a validity icpt has to be presented
to the guest.

We have to take care of tx being toggled on/off, otherwise we might get
wrong prefix validity icpt.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:37 +02:00
David Hildenbrand
bbeaa58b32 KVM: s390: vsie: support aes dea wrapping keys
As soon as message-security-assist extension 3 is enabled for guest 2,
we have to allow key wrapping for guest 3.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:37 +02:00
David Hildenbrand
66b630d5b7 KVM: s390: vsie: support STFLE interpretation
Issuing STFLE is extremely rare. Instead of copying 2k on every
VSIE call, let's do this lazily, when a guest 3 tries to execute
STFLE. We can setup the block and retry.

Unfortunately, we can't directly forward that facility list, as
we only have a 31 bit address for the facility list designation.
So let's use a DMA allocation for our vsie_page instead for now.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:36 +02:00
David Hildenbrand
4ceafa9027 KVM: s390: vsie: support host-protection-interruption
Introduced with ESOP, therefore available for the guest if it
is allowed to use ESOP.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:35 +02:00
David Hildenbrand
535ef81c6e KVM: s390: vsie: support edat1 / edat2
If guest 2 is allowed to use edat 1 / edat 2, it can also set it up for
guest 3, so let's properly check and forward the edat cpuflags.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:35 +02:00
David Hildenbrand
3573602b20 KVM: s390: vsie: support setting the ibc
As soon as we forward an ibc to guest 2 (indicated via
kvm->arch.model.ibc), he can also use it for guest 3. Let's properly round
the ibc up/down, so we avoid any potential validity icpts from the
underlying SIE, if it doesn't simply round the values.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:34 +02:00
David Hildenbrand
06d68a6c85 KVM: s390: vsie: optimize gmap prefix mapping
In order to not always map the prefix, we have to take care of certain
aspects that implicitly unmap the prefix:
- Changes to the prefix address
- Changes to MSO, because the HVA of the prefix is changed
- Changes of the gmap shadow (e.g. unshadowed, asce or edat changes)

By properly handling these cases, we can stop remapping the prefix when
there is no reason to do so.

This also allows us now to not acquire any gmap shadow locks when
rerunning the vsie and still having a valid gmap shadow.

Please note, to detect changing gmap shadows, we have to keep the reference
of the gmap shadow. The address of a gmap shadow does otherwise not
reliably indicate if the gmap shadow has changed (the memory chunk
could get reused).

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:34 +02:00
David Hildenbrand
a3508fbe9d KVM: s390: vsie: initial support for nested virtualization
This patch adds basic support for nested virtualization on s390x, called
VSIE (virtual SIE) and allows it to be used by the guest if the necessary
facilities are supported by the hardware and enabled for the guest.

In order to make this work, we have to shadow the sie control block
provided by guest 2. In order to gain some performance, we have to
reuse the same shadow blocks as good as possible. For now, we allow
as many shadow blocks as we have VCPUs (that way, every VCPU can run the
VSIE concurrently).

We have to watch out for the prefix getting unmapped out of our shadow
gmap and properly get the VCPU out of VSIE in that case, to fault the
prefix pages back in. We use the PROG_REQUEST bit for that purpose.

This patch is based on an initial prototype by Tobias Elpelt.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-21 09:43:33 +02:00
Linus Torvalds
97f78c7de8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Two more bugs fixes for 4.7:

   - a KVM regression introduced with the pgtable.c code split

   - a perf issue with two hardware PMUs using a shared event context"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cpum_cf: use perf software context for hardware counters
  KVM: s390/mm: Fix CMMA reset during reboot
2016-06-20 10:18:58 -07:00
David Hildenbrand
3d84683bd7 s390: introduce page_to_virt() and pfn_to_virt()
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:32 +02:00
David Hildenbrand
37d9df98b7 KVM: s390: backup the currently enabled gmap when scheduled out
Nested virtualization will have to enable own gmaps. Current code
would enable the wrong gmap whenever scheduled out and back in,
therefore resulting in the wrong gmap being enabled.

This patch reenables the last enabled gmap, therefore avoiding having to
touch vcpu->arch.gmap when enabling a different gmap.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:24 +02:00
David Hildenbrand
65d0b0d4bc KVM: s390: fast path for shadow gmaps in gmap notifier
The default kvm gmap notifier doesn't have to handle shadow gmaps.
So let's just directly exit in case we get notified about one.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:21 +02:00
David Hildenbrand
01f719176f s390/mm: don't fault everything in read-write in gmap_pte_op_fixup()
Let's not fault in everything in read-write but limit it to read-only
where possible.

When restricting access rights, we already have the required protection
level in our hands. When reading from guest 2 storage (gmap_read_table),
it is obviously PROT_READ. When shadowing a pte, the required protection
level is given via the guest 2 provided pte.

Based on an initial patch by Martin Schwidefsky.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:20 +02:00
David Hildenbrand
5b6c963bce s390/mm: allow to check if a gmap shadow is valid
It will be very helpful to have a mechanism to check without any locks
if a given gmap shadow is still valid and matches the given properties.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:16 +02:00
David Hildenbrand
4a49443924 s390/mm: remember the int code for the last gmap fault
For nested virtualization, we want to know if we are handling a protection
exception, because these can directly be forwarded to the guest without
additional checks.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:08 +02:00
David Hildenbrand
717c05554a s390/mm: limit number of real-space gmap shadows
We have no known user of real-space designation and only support it to
be architecture compliant.

Gmap shadows with real-space designation are never unshadowed
automatically, as there is nothing to protect for the top level table.

So let's simply limit the number of such shadows to one by removing
existing ones on creation of another one.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:07 +02:00
David Hildenbrand
3218f7094b s390/mm: support real-space for gmap shadows
We can easily support real-space designation just like EDAT1 and EDAT2.
So guest2 can provide for guest3 an asce with the real-space control being
set.

We simply have to allocate the biggest page table possible and fake all
levels.

There is no protection to consider. If we exceed guest memory, vsie code
will inject an addressing exception (via program intercept). In the future,
we could limit the fake table level to the gmap page table.

As the top level page table can never go away, such gmap shadows will never
get unshadowed, we'll have to come up with another way to limit the number
of kept gmap shadows.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:02 +02:00
David Hildenbrand
1c65781b56 s390/mm: push rte protection down to shadow pte
Just like we already do with ste protection, let's take rte protection
into account. This way, the host pte doesn't have to be mapped writable.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:00 +02:00
David Hildenbrand
18b8980988 s390/mm: support EDAT2 for gmap shadows
If the guest is enabled for EDAT2, we can easily create shadows for
guest2 -> guest3 provided tables that make use of EDAT2.

If guest2 references a 2GB page, this memory looks consecutive for guest2,
but it does not have to be so for us. Therefore we have to create fake
segment and page tables.

This works just like EDAT1 support, so page tables are removed when the
parent table (r3t table entry) is changed.

We don't hve to care about:
- ACCF-Validity Control in RTTE
- Access-Control Bits in RTTE
- Fetch-Protection Bit in RTTE
- Common-Region Bit in RTTE

Just like for EDAT1, all bits might be dropped and there is no guaranteed
that they are active.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:56 +02:00
David Hildenbrand
fd8d4e3ab6 s390/mm: support EDAT1 for gmap shadows
If the guest is enabled for EDAT1, we can easily create shadows for
guest2 -> guest3 provided tables that make use of EDAT1.

If guest2 references a 1MB page, this memory looks consecutive for guest2,
but it might not be so for us. Therefore we have to create fake page tables.

We can easily add that to our existing infrastructure. The invalidation
mechanism will make sure that fake page tables are removed when the parent
table (sgt table entry) is changed.

As EDAT1 also introduced protection on all page table levels, we have to
also shadow these correctly.

We don't have to care about:
- ACCF-Validity Control in STE
- Access-Control Bits in STE
- Fetch-Protection Bit in STE
- Common-Segment Bit in STE

As all bits might be dropped and there is no guaranteed that they are
active ("unpredictable whether the CPU uses these bits", "may be used").
Without using EDAT1 in the shadow ourselfes (STE-format control == 0),
simply shadowing these bits would not be enough. They would be ignored.

Please note that we are using the "fake" flag to make this look consistent
with further changes (EDAT2, real-space designation support) and don't let
the shadow functions handle fc=1 stes.

In the future, with huge pages in the host, gmap_shadow_pgt() could simply
try to map a huge host page if "fake" is set to one and indicate via return
value that no lower fake tables / shadow ptes are required.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:51 +02:00
David Hildenbrand
5b062bd494 s390/mm: prepare for EDAT1/EDAT2 support in gmap shadow
In preparation for EDAT1/EDAT2 support for gmap shadows, we have to store
the requested edat level in the gmap shadow.

The edat level used during shadow translation is a property of the gmap
shadow. Depending on that level, the gmap shadow will look differently for
the same guest tables. We have to store it internally in order to support
it later.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:47 +02:00
David Hildenbrand
00fc062d53 s390/mm: push ste protection down to shadow pte
If a guest ste is read-only, it doesn't make sense to force the ptes in as
writable in the host. If the source page is read-only in the host, it won't
have to be made writable. Please note that if the source page is not
available, it will still be faulted in writable. This can be changed
internally later on.

If ste protection is removed, underlying shadow tables are also removed,
therefore this change does not affect the guest.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:45 +02:00
David Hildenbrand
f4debb4090 s390/mm: take ipte_lock during shadow faults
Let's take the ipte_lock while working on guest 2 provided page table, just
like the other gaccess functions.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:40 +02:00
David Hildenbrand
7a6741576b s390/mm: protection exceptions are corrrectly shadowed
As gmap shadows contains correct protection permissions, protection
exceptons can directly be forwarded to guest 3. If we would encounter
a protection exception while faulting, the next guest 3 run will
automatically handle that for us.

Keep the dat_protection logic in place, as it will be helpful later.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:34 +02:00
David Hildenbrand
e52f8b6112 s390/mm: take the mmap_sem in kvm_s390_shadow_fault()
Instead of doing it in the caller, let's just take the mmap_sem
in kvm_s390_shadow_fault(). By taking it as read, we allow parallel
faulting on shadow page tables, gmap shadow code is prepared for that.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:33 +02:00
David Hildenbrand
0f7f848915 s390/mm: fix races on gmap_shadow creation
Before any thread is allowed to use a gmap_shadow, it has to be fully
initialized. However, for invalidation to work properly, we have to
register the new gmap_shadow before we protect the parent gmap table.

Because locking is tricky, and we have to avoid duplicate gmaps, let's
introduce an initialized field, that signalizes other threads if that
gmap_shadow can already be used or if they have to retry.

Let's properly return errors using ERR_PTR() instead of simply returning
NULL, so a caller can properly react on the error.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:28 +02:00
David Hildenbrand
998f637cc4 s390/mm: avoid races on region/segment/page table shadowing
We have to unlock sg->guest_table_lock in order to call
gmap_protect_rmap(). If we sleep just before that call, another VCPU
might pick up that shadowed page table (while it is not protected yet)
and use it.

In order to avoid these races, we have to introduce a third state -
"origin set but still invalid" for an entry. This way, we can avoid
another thread already using the entry before the table is fully protected.
As soon as everything is set up, we can clear the invalid bit - if we
had no race with the unshadowing code.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:27 +02:00
David Hildenbrand
a9d23e71d7 s390/mm: shadow pages with real guest requested protection
We really want to avoid manually handling protection for nested
virtualization. By shadowing pages with the protection the guest asked us
for, the SIE can handle most protection-related actions for us (e.g.
special handling for MVPG) and we can directly forward protection
exceptions to the guest.

PTEs will now always be shadowed with the correct _PAGE_PROTECT flag.
Unshadowing will take care of any guest changes to the parent PTE and
any host changes to the host PTE. If the host PTE doesn't have the
fitting access rights or is not available, we have to fix it up.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:19 +02:00
David Hildenbrand
eea3678d43 s390/mm: flush tlb of shadows in all situations
For now, the tlb of shadow gmap is only flushed when the parent is removed,
not when it is removed upfront. Therefore other shadow gmaps can reuse the
tables without the tlb getting flushed.

Fix this by simply flushing the tlb
1. Before the shadow tables are removed (analogouos to other unshadow functions)
2. When the gmap is freed and therefore the top level pages are freed.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:18 +02:00
Martin Schwidefsky
aa17aa57cf s390/mm: add kvm shadow fault function
This patch introduces function kvm_s390_shadow_fault() used to resolve a
fault on a shadow gmap. This function will do validity checking and
build up the shadow page table hierarchy in order to fault in the
requested page into the shadow page table structure.

If an exception occurs while shadowing, guest 2 has to be notified about
it using either an exception or a program interrupt intercept. If
concurrent unshadowing occurres, this function will simply return with
-EAGAIN and the caller has to retry.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:12 +02:00
Martin Schwidefsky
4be130a084 s390/mm: add shadow gmap support
For a nested KVM guest the outer KVM host needs to create shadow
page tables for the nested guest. This patch adds the basic support
to the guest address space (gmap) code.

For each guest address space the inner KVM host creates, the first
outer KVM host needs to create shadow page tables. The address space
is identified by the ASCE loaded into the control register 1 at the
time the inner SIE instruction for the second nested KVM guest is
executed. The outer KVM host creates the shadow tables starting with
the table identified by the ASCE on a on-demand basis. The outer KVM
host will get repeated faults for all the shadow tables needed to
run the second KVM guest.

While a shadow page table for the second KVM guest is active the access
to the origin region, segment and page tables needs to be restricted
for the first KVM guest. For region and segment and page tables the first
KVM guest may read the memory, but write attempt has to lead to an
unshadow.  This is done using the page invalid and read-only bits in the
page table of the first KVM guest. If the first guest re-accesses one of
the origin pages of a shadow, it gets a fault and the affected parts of
the shadow page table hierarchy needs to be removed again.

PGSTE tables don't have to be shadowed, as all interpretation assist can't
deal with the invalid bits in the shadow pte being set differently than
the original ones provided by the first KVM guest.

Many bug fixes and improvements by David Hildenbrand.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:04 +02:00
Martin Schwidefsky
6ea427bbbd s390/mm: add reference counter to gmap structure
Let's use a reference counter mechanism to control the lifetime of
gmap structures. This will be needed for further changes related to
gmap shadows.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:53:59 +02:00
Martin Schwidefsky
b2d73b2a0a s390/mm: extended gmap pte notifier
The current gmap pte notifier forces a pte into to a read-write state.
If the pte is invalidated the gmap notifier is called to inform KVM
that the mapping will go away.

Extend this approach to allow read-write, read-only and no-access
as possible target states and call the pte notifier for any change
to the pte.

This mechanism is used to temporarily set specific access rights for
a pte without doing the heavy work of a true mprotect call.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:46:49 +02:00
Martin Schwidefsky
8ecb1a59d6 s390/mm: use RCU for gmap notifier list and the per-mm gmap list
The gmap notifier list and the gmap list in the mm_struct change rarely.
Use RCU to optimize the reader of these lists.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:46:49 +02:00
Martin Schwidefsky
414d3b0749 s390/kvm: page table invalidation notifier
Pass an address range to the page table invalidation notifier
for KVM. This allows to notify changes that affect a larger
virtual memory area, e.g. for 1MB pages.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:46:48 +02:00
Hendrik Brueckner
9254e70c4e s390/cpum_cf: use perf software context for hardware counters
On s390, there are two different hardware PMUs for counting and
sampling.  Previously, both PMUs have shared the perf_hw_context
which is not correct and, recently, results in this warning:

    ------------[ cut here ]------------
    WARNING: CPU: 5 PID: 1 at kernel/events/core.c:8485 perf_pmu_register+0x420/0x428
    Modules linked in:
    CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc1+ #2
    task: 00000009c5240000 ti: 00000009c5234000 task.ti: 00000009c5234000
    Krnl PSW : 0704c00180000000 0000000000220c50 (perf_pmu_register+0x420/0x428)
               R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    Krnl GPRS: ffffffffffffffff 0000000000b15ac6 0000000000000000 00000009cb440000
               000000000022087a 0000000000000000 0000000000b78fa0 0000000000000000
               0000000000a9aa90 0000000000000084 0000000000000005 000000000088a97a
               0000000000000004 0000000000749dd0 000000000022087a 00000009c5237cc0
    Krnl Code: 0000000000220c44: a7f4ff54            brc     15,220aec
               0000000000220c48: 92011000           mvi     0(%r1),1
              #0000000000220c4c: a7f40001           brc     15,220c4e
              >0000000000220c50: a7f4ff12           brc     15,220a74
               0000000000220c54: 0707               bcr     0,%r7
               0000000000220c56: 0707               bcr     0,%r7
               0000000000220c58: ebdff0800024       stmg    %r13,%r15,128(%r15)
               0000000000220c5e: a7f13fe0           tmll    %r15,16352
    Call Trace:
    ([<000000000022087a>] perf_pmu_register+0x4a/0x428)
    ([<0000000000b2c25c>] init_cpum_sampling_pmu+0x14c/0x1f8)
    ([<0000000000100248>] do_one_initcall+0x48/0x140)
    ([<0000000000b25d26>] kernel_init_freeable+0x1e6/0x2a0)
    ([<000000000072bda4>] kernel_init+0x24/0x138)
    ([<000000000073495e>] kernel_thread_starter+0x6/0xc)
    ([<0000000000734958>] kernel_thread_starter+0x0/0xc)
    Last Breaking-Event-Address:
     [<0000000000220c4c>] perf_pmu_register+0x41c/0x428
    ---[ end trace 0c6ef9f5b771ad97 ]---

Using the perf_sw_context is an option because the cpum_cf PMU does
not use interrupts.  To make this more clear, initialize the
capabilities in the PMU structure.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-16 12:08:49 +02:00
Peter Zijlstra
b53d6bedbe locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
Since all architectures have this implemented now natively, remove this
dead code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-16 10:48:32 +02:00
Peter Zijlstra
56fefbbc3f locking/atomic, arch/s390: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
Implement FETCH-OP atomic primitives, these are very similar to the
existing OP-RETURN primitives we already have, except they return the
value of the atomic variable _before_ modification.

This is especially useful for irreversible operations -- such as
bitops (because it becomes impossible to reconstruct the state prior
to modification).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-16 10:48:29 +02:00
Paolo Bonzini
a03825bbd0 KVM: s390: use kvm->created_vcpus
The new created_vcpus field avoids possible races between enabling
capabilities and creating VCPUs.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 10:07:37 +02:00
Heiko Carstens
11a7752e01 s390: remove math emulation code
The last in-kernel user is gone so we can finally remove this code.

Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:37:11 +02:00
Heiko Carstens
9c203239c5 s390: calculate loops_per_jiffies with fp instructions
Implement calculation of loops_per_jiffies with fp instructions which
are available on all 64 bit machines.
To save and restore floating point register context use the new vx support
functions.

Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:37:10 +02:00
Hendrik Brueckner
a086171ad8 s390: Updated kernel config files
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:37:07 +02:00
Hendrik Brueckner
f848dbd3bc s390/crc32-vx: add crypto API module for optimized CRC-32 algorithms
Add a crypto API module to access the vector extension based CRC-32
implementations.  Users can request the optimized implementation through
the shash crypto API interface.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:36:34 +02:00
Paolo Bonzini
f26ed98326 KVM: s390: Features and fixes for 4.8 part1
Four bigger things:
 1. The implementation of the STHYI opcode in the kernel. This is used
    in libraries like qclib [1] to provide enough information for a
    capacity and usage based software licence pricing. The STHYI content
    is defined by the related z/VM documentation [2]. Its data can be
    composed by accessing several other interfaces provided by LPAR or
    the machine. This information is partially sensitive or root-only
    so the kernel does the necessary filtering.
 2. Preparation for nested virtualization (VSIE). KVM should query the
    proper sclp interfaces for the availability of some features before
    using it. In the past we have been sloppy and simply assumed that
    several features are available. With this we should be able to handle
    most cases of a missing feature.
 3. CPU model interfaces extended by some additional features that are
    not covered by a facility bit in STFLE. For example all the crypto
    instructions of the coprocessor provide a query function. As reality
    tends to be more complex (e.g. export regulations might block some
    algorithms) we have to provide additional interfaces to query or
    set these non-stfle features.
 4. Several fixes and changes detected and fixed when doing 1-3.
 
 All features change base s390 code. All relevant patches have an ACK
 from the s390 or component maintainers.
 
 The next pull request for 4.8 (part2) will contain the implementation
 of VSIE.
 
 [1] http://www.ibm.com/developerworks/linux/linux390/qclib.html
 [2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJXX+A6AAoJEBF7vIC1phx8SBoQAIkFTMxoGvY9lFkkreUXIyeX
 XL0grybhsaKd4tT80FlobTl2ejpo/feRl5RfD5Oi75UCR4oMuk3Eb8bIyQjcKJvS
 7sYFz+zP9TZ5S/rxvc3EanXpcNnfowKDuLUyOTaq0Hq8XQHaSwzYGGbtPgTdMDAp
 DyhwNhYK8cPvmBS3KHX70ZOMfl9J4s0xvgs42BRJyyDGYrJOZcN1NLsG2l1dAb0L
 au/Svb05PxhgQvqoUId3VSrmRKLm9tSk5DJdIRcmj1+4Mlhfw14LTV+wGuTLTgSZ
 GOyEdum2E/b4QABWca7sxmgqo+Wo5voOW+WKOGLMiN2sK+JwvSnu4qmiRG/qgFCJ
 EQDZer+OEQTu+YgZzjm/r5wbIkV/gqUenjjepk5iWrxK6EB7CmlQuZyyEKm3wO7i
 LrEDqRU7SY+PuUu+Ov6/PHxmMy5DJuK+AedRe8uzuDSmYpSekYFLD44gctkPe56q
 uq4Fhx3g3EIkPMcHnAae92vHLp/INCHCGoPb4Xh6CnaP4Xm+RntCv2hWxw30rHgc
 IIYVy4fSyJuTeHpFcNgeBrbcx4jwvkfJ9kxezM864DA9hBBfcS3ZZDhLM5PPEaLr
 usu7Gt6nHeFtwvXxZn/Y+SsYWCWpmbt6An/m+lqf05aAqyndhbwJ8Kftz3OAxKDw
 b7o59x2wvV9dfakAHxNx
 =fdBQ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features and fixes for 4.8 part1

Four bigger things:
1. The implementation of the STHYI opcode in the kernel. This is used
   in libraries like qclib [1] to provide enough information for a
   capacity and usage based software licence pricing. The STHYI content
   is defined by the related z/VM documentation [2]. Its data can be
   composed by accessing several other interfaces provided by LPAR or
   the machine. This information is partially sensitive or root-only
   so the kernel does the necessary filtering.
2. Preparation for nested virtualization (VSIE). KVM should query the
   proper sclp interfaces for the availability of some features before
   using it. In the past we have been sloppy and simply assumed that
   several features are available. With this we should be able to handle
   most cases of a missing feature.
3. CPU model interfaces extended by some additional features that are
   not covered by a facility bit in STFLE. For example all the crypto
   instructions of the coprocessor provide a query function. As reality
   tends to be more complex (e.g. export regulations might block some
   algorithms) we have to provide additional interfaces to query or
   set these non-stfle features.
4. Several fixes and changes detected and fixed when doing 1-3.

All features change base s390 code. All relevant patches have an ACK
from the s390 or component maintainers.

The next pull request for 4.8 (part2) will contain the implementation
of VSIE.

[1] http://www.ibm.com/developerworks/linux/linux390/qclib.html
[2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm
2016-06-15 09:21:46 +02:00
Kees Cook
0208b9445b s390/ptrace: run seccomp after ptrace
Close the hole where ptrace can change a syscall out from under seccomp.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
2016-06-14 10:54:45 -07:00
Andy Lutomirski
2f275de5d1 seccomp: Add a seccomp_data parameter secure_computing()
Currently, if arch code wants to supply seccomp_data directly to
seccomp (which is generally much faster than having seccomp do it
using the syscall_get_xyz() API), it has to use the two-phase
seccomp hooks. Add it to the easy hooks, too.

Cc: linux-arch@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-06-14 10:54:39 -07:00
Hendrik Brueckner
19c93787f5 s390/crc32-vx: use vector instructions to optimize CRC-32 computation
Use vector instructions to optimize the computation of CRC-32 checksums.
An optimized version is provided for CRC-32 (IEEE 802.3 Ethernet) in
normal and bitreflected domain, as well as, for bitreflected CRC-32C
(Castagnoli).

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-14 16:54:16 +02:00
Hendrik Brueckner
0486480802 s390/vx: add support functions for in-kernel FPU use
Introduce the kernel_fpu_begin() and kernel_fpu_end() function
to enclose any in-kernel use of FPU instructions and registers.
In enclosed sections, you can perform floating-point or vector
(SIMD) computations.  The functions take care of saving and
restoring FPU register contents and controls.

For usage details, see the guidelines in arch/s390/include/asm/fpu/api.h

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-14 16:54:11 +02:00
Heiko Carstens
de3fa841e4 s390/mm: fix compile for PAGE_DEFAULT_KEY != 0
The usual problem for code that is ifdef'ed out is that it doesn't
compile after a while. That's also the case for the storage key
initialisation code, if it would be used (set PAGE_DEFAULT_KEY to
something not zero):

./arch/s390/include/asm/page.h: In function 'storage_key_init_range':
./arch/s390/include/asm/page.h:36:2: error: implicit declaration of function '__storage_key_init_range'

Since the code itself has been useful for debugging purposes several
times, remove the ifdefs and make sure the code gets compiler
coverage. The cost for this is eight bytes.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-14 16:54:05 +02:00
Peter Zijlstra
726328d92a locking/spinlock, arch: Update and fix spin_unlock_wait() implementations
This patch updates/fixes all spin_unlock_wait() implementations.

The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.

This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.

I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.

Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.

I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.

Note: PPC32 will be fixed independently

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14 11:55:15 +02:00
Andrea Gelmini
960cb306e6 KVM: S390: Fix typo
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-14 11:16:27 +02:00
Heiko Carstens
86d18a55dd s390/topology: remove z10 special handling
I don't have a z10 to test this anymore, so I have no idea if the code
works at all or even crashes. I can try to emulate, but it is just
guess work.

Nor do we know if the z10 special handling is performance wise still
better than the generic handling. There have been a lot of changes to
the scheduler.

Therefore let's play safe and remove the special handling.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:27 +02:00
Heiko Carstens
adac0f1e8c s390/topology: add drawer scheduling domain level
The z13 machine added a fourth level to the cpu topology
information. The new top level is called drawer.

A drawer contains two books, which used to be the top level.

Adding this additional scheduling domain did show performance
improvements for some workloads of up to 8%, while there don't
seem to be any workloads impacted in a negative way.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:27 +02:00
Heiko Carstens
0599eead58 s390/ipl: rename diagnose enums
Rename DIAG308_IPL and DIAG308_DUMP to DIAG308_LOAD_CLEAR and
DIAG308_LOAD_NORMAL_DUMP to better reflect the associated IPL
functions.

Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:26 +02:00
Heiko Carstens
0f7451ff3a s390/ipl: use load normal for LPAR re-ipl
Avoid clearing memory for CCW-type re-ipl within a logical
partition. This can save a significant amount of time if a logical
partition contains a lot of memory.

On the other hand we still clear memory if running within a second
level hypervisor, since the hypervisor can simply free all memory that
was used for the guest.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:26 +02:00
Heiko Carstens
6c22c98637 s390: avoid extable collisions
We have some inline assemblies where the extable entry points to a
label at the end of an inline assembly which is not followed by an
instruction.

On the other hand we have also inline assemblies where the extable
entry points to the first instruction of an inline assembly.

If a first type inline asm (extable point to empty label at the end)
would be directly followed by a second type inline asm (extable points
to first instruction) then we would have two different extable entries
that point to the same instruction but would have a different target
address.

This can lead to quite random behaviour, depending on sorting order.

I verified that we currently do not have such collisions within the
kernel. However to avoid such subtle bugs add a couple of nop
instructions to those inline assemblies which contain an extable that
points to an empty label.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:26 +02:00
Heiko Carstens
ee64baf4ea s390/uaccess: use __builtin_expect for get_user/put_user
We always expect that get_user and put_user return with zero. Give the
compiler a hint so it can slightly optimize the code and avoid
branches.
This is the same what x86 got with commit a76cf66e94 ("x86/uaccess:
Tell the compiler that uaccess is unlikely to fault").

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:25 +02:00
Heiko Carstens
b8ac5e2f4d s390/uaccess: fix whitespace damage
Fix some whitespace damage that was introduced by me with a
query-replace when removing 31 bit support.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:25 +02:00
Sebastian Ott
8ee2db3cf1 s390/pci: ensure to not cross a dma segment boundary
When we use the iommu_area_alloc helper to get dma addresses
we specify the boundary_size parameter but not the offset (called
shift in this context).

As long as the offset (start_dma) is a multiple of the boundary
we're ok (on current machines start_dma always seems to be 4GB).

Don't leave this to chance and specify the offset for iommu_area_alloc.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:24 +02:00
Sebastian Ott
53b1bc9aba s390/pci: ensure page aligned dma start address
We don't have an architectural guarantee on the value of
the dma offset but rely on it to be at least page aligned.
Enforce page alignemt of start_dma.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:24 +02:00
Sebastian Ott
bb98f396f1 s390: use SPARSE_IRQ
Use dynamically allocated irq descriptors on s390 which allows
us to get rid of the s390 specific config option PCI_NR_MSI and
exploit more MSI interrupts. Also the size of the kernel image
is reduced by 131K (using performance_defconfig).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:24 +02:00
Heiko Carstens
72a9b02d3b s390: use __section macro everywhere
Small cleanup patch to use the shorter __section macro everywhere.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:23 +02:00
Heiko Carstens
d07a980c1b s390: add proper __ro_after_init support
On s390 __ro_after_init is currently mapped to __read_mostly which
means that data marked as __ro_after_init will not be protected.

Reason for this is that the common code __ro_after_init implementation
is x86 centric: the ro_after_init data section was added to rodata,
since x86 enables write protection to kernel text and rodata very
late. On s390 we have write protection for these sections enabled with
the initial page tables. So adding the ro_after_init data section to
rodata does not work on s390.

In order to make __ro_after_init work properly on s390 move the
ro_after_init data, right behind rodata. Unlike the rodata section it
will be marked read-only later after all init calls happened.

This s390 specific implementation adds new __start_ro_after_init and
__end_ro_after_init labels. Everything in between will be marked
read-only after the init calls happened. In addition to the
__ro_after_init data move also the exception table there, since from a
practical point of view it fits the __ro_after_init requirements.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:23 +02:00
Martin Schwidefsky
64f31d5802 s390/mm: simplify the TLB flushing code
ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to
decide between a lazy TLB flush vs an immediate TLB flush. The field
contains two 16-bit counters, the number of CPUs that have the mm
attached and can create TLB entries for it and the number of CPUs in
the middle of a page table update.

The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions
use the attach counter and a mask check with mm_cpumask(mm) to decide
between a local flush local of the current CPU and a global flush.

For all these functions the decision between lazy vs immediate and
local vs global TLB flush can be based on CPU masks. There are two
masks:  the mm->context.cpu_attach_mask with the CPUs that are actively
using the mm, and the mm_cpumask(mm) with the CPUs that have used the
mm since the last full flush. The decision between lazy vs immediate
flush is based on the mm->context.cpu_attach_mask, to decide between
local vs global flush the mm_cpumask(mm) is used.

With this patch all checks will use the CPU masks, the old counter
mm->context.attach_count with its two 16-bit values is turned into a
single counter mm->context.flush_count that keeps track of the number
of CPUs with incomplete page table updates. The sole user of this
counter is finish_arch_post_lock_switch() which waits for the end of
all page table updates.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:22 +02:00
Martin Schwidefsky
a9809407f6 s390/mm: fix vunmap vs finish_arch_post_lock_switch
The vunmap_pte_range() function calls ptep_get_and_clear() without any
locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct()
for the page table update. ptep_flush_direct requires that preemption
is disabled, but without any locking this is not the case. If the kernel
preempts the task while the attach_counter is increased an endless loop
in finish_arch_post_lock_switch() will occur the next time the task is
scheduled.

Add explicit preempt_disable()/preempt_enable() calls to the relevant
functions in arch/s390/mm/pgtable.c.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:21 +02:00