Commit graph

4134 commits

Author SHA1 Message Date
Hendrik Brueckner
0ac277790e s390/fpu: add static FPU save area for init_task
Previously, the init task did not have an allocated FPU save area and
saving an FPU state was not possible.  Now if the vector extension is
always enabled, provide a static FPU save area to save FPU states of
vector instructions that can be executed quite early.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Hendrik Brueckner
b5510d9b68 s390/fpu: always enable the vector facility if it is available
If the kernel detects that the s390 hardware supports the vector
facility, it is enabled by default at an early stage.  To force
it off, use the novx kernel parameter.  Note that there is a small
time window, where the vector facility is enabled before it is
forced to be off.

With enabling the vector facility by default, the FPU save and
restore functions can be improved.  They do not longer require
to manage expensive control register updates to enable or disable
the vector enablement control for particular processes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Martin Schwidefsky
395e6aa1d0 s390/mm: try to avoid storage key operation in ptep_set_access_flags
The call to pgste_set_key in ptep_set_access_flags can be avoided
if the old pte is found to be valid at the time the new access
rights are set. The function that created the old, valid pte already
completed the required storage key operation.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:07 +02:00
Martin Schwidefsky
5da7667c03 s390/barrier: remove unnecessary serialization in atomics and bitops
The principles of operation states reads are in order, writes are in
order, writes can be reordered after reads, but no reads can be
reordered after writes.

The atomic and bitops variantes for z196 use the interlocked-access
facility instructions with a memory barrier before and after the
instruction. Because of the memory ordering the first barrier is
unnecessary and can be removed.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:07 +02:00
Martin Schwidefsky
b5a6b71b19 s390/diag: add tracepoint for diagnose calls
To be able to analyse problems in regard to hypervisor overhead
add a tracepoing for diagnose calls. It reports the number of
the diagnose issued, e.g.

            sshd-1385  [002] ....    42.701431: diagnose: nr=0x9c
          <idle>-0     [001] ..s.    43.587528: diagnose: nr=0x9c

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:06 +02:00
Martin Schwidefsky
1ec2772e0c s390/diag: add a statistic for diagnose calls
Introduce /sys/debug/kernel/diag_stat with a statistic how many diagnose
calls have been done by each CPU in the system.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:06 +02:00
Martin Schwidefsky
acdc9fc9a8 s390/bitops: implement cache friendly test_and_set_bit_lock
The generic implementation for test_and_set_bit_lock in include/asm-generic
uses the standard test_and_set_bit operation. This is done with either a
'csg' or a 'loag' instruction. For both version the cache line is fetched
exclusively, even if the bit is already set. The result is an increase in
cache traffic, for a contented lock this is a bad idea.

Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:06 +02:00
Martin Schwidefsky
5614dd920a s390/mm: implement soft-dirty bits for user memory change tracking
Use bit 2**1 of the pte and bit 2**14 of the pmd for the soft dirty
bit. The fault mechanism to do dirty tracking is already in place.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:05 +02:00
Sebastian Ott
9d49f86dab s390/cio: introduce pathmask_to_pos
We often need to correlate an 8 bit path mask with the position
in a channel path array. Introduce and use pathmask_to_pos for
that task.

Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:04 +02:00
Sebastian Ott
1bc6664bdf s390/cio: use device_lock during cmb activation
Hold the device_lock during [de]activation of the channel measurement
block to synchronize concurrent usage of these functions.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:02 +02:00
Alexander Kuleshov
3c4aac86cb s390/crash_dump: use for_each_mem_range
The <linux/memblock.h> already provides for_each_mem_range() macro that
iterates through memblock areas from type_a and not included in type_b.
We can remove custom for_each_dump_mem_range() macro and use the
for_each_mem_range() instead.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2015-10-14 14:32:01 +02:00
Christian Borntraeger
1afc82aee4 s390/barrier: avoid serialization in [smp_]rmb and [smp_]wmb
The principles of operation says:

The storage-operand fetch references of one instruction
occur after those of all preceding instructions and
before those of subsequent instructions, as observed
by other CPUs and by channel programs.
[...]
The CPU may fetch the operands of instructions before the
instructions are executed.
[...]
The CPU may delay placing results in storage.
[...]
the results of one instruction are placed in storage after
the results of all preceding instructions have been placed
in storage and before any results of the succeeding
instructions are stored, as observed by other CPUs and by
the channel subsystem.

which boils down to:
- reads are in order
- writes are in order
- reads can happen earlier
- writes can happen later

By definition (see memory-barrier.txt) read barriers orders
reads vs reads and write barriers orders writes agains writes.
but neither of these orders reads vs. writes.

That means we can implement smp_wmb,smp_rmb,wmb and rmb as
simple compiler barriers. To avoid reviewing all driver code
for correct barrier usage we keep dma_[rw]mb as serialization
for now.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:01 +02:00
Christian Borntraeger
33b5881d11 s390/vdso: use correct memory barrier
By definition smp_wmb only orders writes against writes. (Finish all
previous writes, and do not start any future write). To protect the
vdso init code against early reads on other CPUs, let's use a full
smp_mb at the end of vdso init. As right now smp_wmb is implemented
as full serialization, this needs no stable backport, but this change
will be necessary if we reimplement smp_wmb.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:01 +02:00
Christian Borntraeger
e0af21c56d s390/spinlock: use correct barriers
_raw_write_lock_wait first sets the high order bit to indicate a
pending writer and then waits for the reader to drop to zero.
smp_rmb by definition only orders reads against reads. Let's use
a full smp_mb instead. As right now smp_rmb is implemented
as full serialization, this needs no stable backport, but this
patch will be necessary if we reimplement smp_rmb.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:00 +02:00
Michael Holzheu
b02064a9b8 s390/numa: write kernel message when emu_size has been increased
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:31:59 +02:00
Linus Torvalds
3ec20e2e61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Three bug fixes and an update to the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/defconfig: set SCSI_DH=y
  s390/vtime: correct scaled cputime of partially idle CPUs
  s390/boot/decompression: disable floating point in decompressor
  s390/numa: use correct type for node_to_cpumask_map
2015-10-06 14:59:36 +01:00
Linus Torvalds
30c44659f4 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.

Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.

The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.

strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result.  To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.

strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string.  Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated.  It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.

strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG.  It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.

So why did I waffle about this for so long?

Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.

And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.

So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches.  Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: use global strscpy() rather than private copy
  string: provide strscpy()
  Make asm/word-at-a-time.h available on all architectures
2015-10-04 16:31:13 +01:00
Sebastian Ott
daad0bf149 s390/defconfig: set SCSI_DH=y
Fix this warning:
arch/s390/configs/performance_defconfig:380:warning: symbol value 'm' invalid for SCSI_DH

Introduced via 086b91d052
(scsi_dh: integrate into the core SCSI code)

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-01 10:48:36 +02:00
Martin Schwidefsky
72d38b1978 s390/vtime: correct scaled cputime of partially idle CPUs
The calculation for the SMT scaling factor for a hardware thread
which has been partially idle needs to disregard the cycles spent
by the other threads of the core while the thread is idle.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-30 16:22:38 +02:00
Christian Borntraeger
adc0b7fbf6 s390/boot/decompression: disable floating point in decompressor
my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for
spilling/filling into a floating point register in our decompressor.

This will cause an AFP-register data exception as the decompressor
did not setup the additional floating point registers via cr0.
That causes a program check loop that looked like a hang with
one "Uncompressing Linux... " message (directly booted via kvm)
or a loop of "Uncompressing Linux... " messages (when booted via
zipl boot loader).

The offending code in my build was

   48e400:       e3 c0 af ff ff 71       lay     %r12,-1(%r10)
-->48e406:       b3 c1 00 1c             ldgr    %f1,%r12
   48e40a:       ec 6c 01 22 02 7f       clij    %r6,2,12,0x48e64e

but gcc could do spilling into an fpr at any function. We can
simply disable floating point support at that early stage.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
2015-09-29 14:45:10 +02:00
Linus Torvalds
b6d980f493 AMD fixes for bugs introduced in the 4.2 merge window,
and a few PPC bug fixes too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWBSn7AAoJEL/70l94x66Dxd4H/RT6kWWj9x4grEYUkcJUDyK2
 AXm7XcKQm04auwAic8Otr+ts/Qix/50kWmBe/TU0QLgqb8rj5Dj3yGFK6Z1y6mAz
 KvaxqMJd4tZGTqN0DDvC2ItEdzjfAdeJZo/FHXqPHVspG0G14T7STLna02LTBBEJ
 tNzY9qor8nFhg2fT2szqKaudUNgTqkCTpo57o2BrHE96SHG+m0WdpQCV1F5hPVpg
 Te0Pb7qX9xng5n3sQ7IV/t3QYbrza1ACwNQS9XJa0Yu6iEz7JdmVmzHQASK9ynn6
 hUHhsNYGx4IsPjPtfJk2GroNaRDZL+VMzw07tfcOvPx8xkS9hS63pwzmSBqfLrM=
 =Ywqn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
  bug fixes too"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: disable halt_poll_ns as default for s390x
  KVM: x86: fix off-by-one in reserved bits check
  KVM: x86: use correct page table format to check nested page table reserved bits
  KVM: svm: do not call kvm_set_cr0 from init_vmcb
  KVM: x86: trap AMD MSRs for the TSeg base and mask
  KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
  kvm: svm: reset mmu on VCPU reset
2015-09-25 10:51:40 -07:00
David Hildenbrand
920552b213 KVM: disable halt_poll_ns as default for s390x
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25 10:31:30 +02:00
Martin Schwidefsky
22be9cd9f2 s390/numa: use correct type for node_to_cpumask_map
With CONFIG_CPUMASK_OFFSTACK=y cpumask_var_t is a pointer to a CPU mask.
Replace the incorrect type for node_to_cpumask_map with cpumask_t.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-23 09:18:56 +02:00
Linus Torvalds
90a835f5d1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "A couple of system call updates.  The two new system calls userfaultfd
  and membarrier have been added, as well as the 17 direct calls for the
  multiplexed socket system calls.

  In addition the system call compat wrappers have been flagged as
  notrace functions and a few wrappers could be removed.

  And bug fixes for the vector register handling, cpu_mf, suspend/resume,
  compat signals, SMT cputime accounting and the zfcp dumper"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: wire up separate socketcalls system calls
  s390/compat: remove superfluous compat wrappers
  s390/compat: do not trace compat wrapper functions
  s390/s390x: allocate sys_membarrier system call number
  s390/configs//zfcpdump_defconfig: Remove CONFIG_MEMSTICK
  s390: wire up userfaultfd system call
  s390/vtime: correct scaled cputime for SMT
  s390/cpum_cf: Corrected return code for unauthorized counter sets
  s390/compat: correct uc_sigmask of the compat signal frame
  s390: fix floating point register corruption
  s390/hibernate: fix save and restore of vector registers
2015-09-21 09:53:30 -07:00
Heiko Carstens
977108f89c s390: wire up separate socketcalls system calls
As discussed on linux-arch all architectures should wire up the separate
system calls that are hidden behind the socketcall multiplexer system call.

It's just a couple more system calls and gives us a very small performance
improvement.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-18 11:16:53 +02:00
Heiko Carstens
7681df456f s390/compat: remove superfluous compat wrappers
A couple of compat wrapper functions are simply trampolines to the real
system call. This happened because the compat wrapper defines will only
sign and zero extend system call parameters which are of different size
on s390/s390x (longs and pointers).
All other parameters will be correctly sign and zero extended by the
normal system call wrappers.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-18 11:16:52 +02:00
Heiko Carstens
a55b2ae78c s390/compat: do not trace compat wrapper functions
Add notrace to the compat wrapper define to disable tracing of compat
wrapper functions. These are supposed to be very small and more or less
just a trampoline to the real system call.

Also fix indentation.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-18 11:16:50 +02:00
Mathieu Desnoyers
96d4ee8aaa s390/s390x: allocate sys_membarrier system call number
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-api@vger.kernel.org
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:44 +02:00
Michael Holzheu
3dc636b2da s390/configs//zfcpdump_defconfig: Remove CONFIG_MEMSTICK
This config option is completely irrelevant for zfcpdump and
unfortunately causes a kernel panic on recent kernels in
"mspro_block_init()/driver_register()".

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:44 +02:00
Heiko Carstens
022435713c s390: wire up userfaultfd system call
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:43 +02:00
Martin Schwidefsky
61cc37906b s390/vtime: correct scaled cputime for SMT
The scaled cputime is supposed to be derived from the normal per-thread
cputime by dividing it with the average thread density in the last interval.

The calculation of the scaling values for the average thread density is
incorrect. The current, incorrect calculation:

    Ci = cycle count with i active threads
    T = unscaled cputime, sT = scaled cputime
    sT = T * (C1 + C2 + ... + Cn) / (1*C1 + 2*C2 + ... + n*Cn)

The calculation happens to yield the correct numbers for the simple cases
with only one Ci value not zero. But for cases with multiple Ci values not
zero it fails. E.g. on a SMT-2 system with one thread active half the time
and two threads active for the other half of the time it fails, the scaling
factor should be 3/4 but the formula gives 2/3.

The correct formula is

    sT = T * (C1/1 + C2/2 + ... + Cn/n) / (C1 + C2 + ... + Cn)

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:43 +02:00
Hendrik Brueckner
58f8e9da33 s390/cpum_cf: Corrected return code for unauthorized counter sets
Previously, the cpum_cf PMU returned -EPERM if a counter is requested and
the counter set to which the counter belongs is not authorized.  According
to the perf_event_open() system call manual, an error code of EPERM indicates
an unsupported exclude setting or CAP_SYS_ADMIN is missing.

Use ENOENT to indicate that particular counters are not available when the
counter set which contains the counter is not authorized.  For generic events,
this might trigger a fall back, for example, to a software event.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:42 +02:00
Martin Schwidefsky
8d4bd0ed04 s390/compat: correct uc_sigmask of the compat signal frame
The uc_sigmask in the ucontext structure is an array of words to keep
the 64 signal bits (or 1024 if you ask glibc but the kernel sigset_t
only has 64 bits).

For 64 bit the sigset_t contains a single 8 byte word, but for 31 bit
there are two 4 byte words. The compat signal handler code uses a
simple copy of the 64 bit sigset_t to the 31 bit compat_sigset_t.
As s390 is a big-endian architecture this is incorrect, the two words
in the 31 bit sigset_t array need to be swapped.

Cc: <stable@vger.kernel.org>
Reported-by: Stefan Liebler <stli@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:42 +02:00
Heiko Carstens
9380cf5a88 s390: fix floating point register corruption
The critical section cleanup code misses to add the offset of the
thread_struct to the task address.
Therefore, if the critical section code gets executed, it may corrupt
the task struct or restore the contents of the floating point registers
from the wrong memory location.
Fixes d0164ee20d "s390/kernel: remove save_fpu_regs() parameter and use
__LC_CURRENT instead".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:41 +02:00
Martin Schwidefsky
88d6425378 s390/hibernate: fix save and restore of vector registers
The swsusp_arch_suspend()/swsusp_arch_resume() functions currently only
save and restore the floating point registers. If the task that started
the hibernation process is using vector registers they can get lost.
To fix this just call save_fpu_regs in swsusp_arch_suspend(), the restore
will happen automatically on return to user space.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:41 +02:00
Jason J. Herne
9bf9fde2c9 KVM: s390: Replace incorrect atomic_or with atomic_andnot
The offending commit accidentally replaces an atomic_clear with an
atomic_or instead of an atomic_andnot in kvm_s390_vcpu_request_handled.
The symptom is that kvm guests on s390 hang on startup.
This patch simply replaces the incorrect atomic_or with atomic_andnot

Fixes: 805de8f43c (atomic: Replace atomic_{set,clear}_mask() usage)
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 17:01:06 +02:00
Paolo Bonzini
62bea5bff4 KVM: add halt_attempted_poll to VCPU stats
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.

For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling.  This would show as an abnormally high number of
attempted polling compared to the successful polls.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 12:17:00 +02:00
Christoph Hellwig
452e06af1f dma-mapping: consolidate dma_set_mask
Almost everyone implements dma_set_mask the same way, although some time
that's hidden in ->set_dma_mask methods.

This patch consolidates those into a common implementation that either
calls ->set_dma_mask if present or otherwise uses the default
implementation.  Some architectures used to only call ->set_dma_mask
after the initial checks, and those instance have been fixed to do the
full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
been fixed.

Unfortunately some architectures overload unrelated semantics like changing
the dma_ops into it so we still need to allow for an architecture override
for now.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
ee196371d5 dma-mapping: consolidate dma_supported
Most architectures just call into ->dma_supported, but some also return 1
if the method is not present, or 0 if no dma ops are present (although
that should never happeb). Consolidate this more broad version into
common code.

Also fix h8300 which inorrectly always returned 0, which would have been
a problem if it's dma_set_mask implementation wasn't a similarly buggy
noop.

As a few architectures have much more elaborate implementations, we
still allow for arch overrides.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
efa21e432c dma-mapping: cosolidate dma_mapping_error
Currently there are three valid implementations of dma_mapping_error:

 (1) call ->mapping_error
 (2) check for a hardcoded error code
 (3) always return 0

This patch provides a common implementation that calls ->mapping_error
if present, then checks for DMA_ERROR_CODE if defined or otherwise
returns 0.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
1e8937526e dma-mapping: consolidate dma_{alloc,free}_noncoherent
Most architectures do not support non-coherent allocations and either
define dma_{alloc,free}_noncoherent to their coherent versions or stub
them out.

Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
implements them directly.

This patch moves the Openrisc version to common code, and handles the
DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.

Note that actual non-coherent allocations require a dma_cache_sync
implementation, so if non-coherent allocations didn't work on
an architecture before this patch they still won't work after it.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
6894258eda dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.

This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.

This patch (of 5):

The coherent DMA allocator works the same over all architectures supporting
dma_map operations.

This patch consolidates them and converges the minor differences:

 - the debug_dma helpers are now called from all architectures, including
   those that were previously missing them
 - dma_alloc_from_coherent and dma_release_from_coherent are now always
   called from the generic alloc/free routines instead of the ops
   dma-mapping-common.h always includes dma-coherent.h to get the defintions
   for them, or the stubs if the architecture doesn't support this feature
 - checks for ->alloc / ->free presence are removed.  There is only one
   magic instead of dma_map_ops without them (mic_dma_ops) and that one
   is x86 only anyway.

Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags.  An optional arch hook is provided
for that.

[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Yinghai Lu
2d3862d26e lib/decompressors: use real out buf size for gunzip with kernel
When loading x86 64bit kernel above 4GiB with patched grub2, got kernel
gunzip error.

| early console in decompress_kernel
| decompress_kernel:
|       input: [0x807f2143b4-0x807ff61aee]
|      output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len
| boot via startup_64
| KASLR using RDTSC...
|  new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size
|  decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee]
|
| Decompressing Linux... gz...
|
| uncompression error
|
| -- System halted

the new buffer is at 0x46fe000000ULL, decompressor_gzip is using
0xffffffb901ffffff as out_len.  gunzip in lib/zlib_inflate/inflate.c cap
that len to 0x01ffffff and decompress fails later.

We could hit this problem with crashkernel booting that uses kexec loading
kernel above 4GiB.

We have decompress_* support:
    1. inbuf[]/outbuf[] for kernel preboot.
    2. inbuf[]/flush() for initramfs
    3. fill()/flush() for initrd.
This bug only affect kernel preboot path that use outbuf[].

Add __decompress and take real out_buf_len for gunzip instead of guessing
wrong buf size.

Fixes: 1431574a1c (lib/decompressors: fix "no limit" output buffer length)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Dave Young
2965faa5e0 kexec: split kexec_load syscall from kexec core code
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds
12f03ee606 libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.  This facility is used by the pmem driver to
    enable pfn_to_page() operations on the page frames returned by DAX
    ('direct_access' in 'struct block_device_operations'). For now, the
    'memmap' allocation for these "device" pages comes from "System
    RAM".  Support for allocating the memmap from device memory will
    arrive in a later kernel.
 
 2/ Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt().  memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects.  The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.  Completion of
    the conversion is targeted for v4.4.
 
 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.
 
 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.
 
 5/ Miscellaneous updates and fixes to libnvdimm including support
    for issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
 OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
 nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
 NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
 zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
 sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
 bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
 dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
 slsw6DkrWT60CRE42nbK
 =o57/
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to ->direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
2015-09-08 14:35:59 -07:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Linus Torvalds
dd5cdb48ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Another merge window, another set of networking changes.  I've heard
  rumblings that the lightweight tunnels infrastructure has been voted
  networking change of the year.  But what do I know?

   1) Add conntrack support to openvswitch, from Joe Stringer.

   2) Initial support for VRF (Virtual Routing and Forwarding), which
      allows the segmentation of routing paths without using multiple
      devices.  There are some semantic kinks to work out still, but
      this is a reasonably strong foundation.  From David Ahern.

   3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

   4) Ignore route nexthops with a link down state in ipv6, just like
      ipv4.  From Andy Gospodarek.

   5) Remove spinlock from fast path of act_gact and act_mirred, from
      Eric Dumazet.

   6) Document the DSA layer, from Florian Fainelli.

   7) Add netconsole support to bcmgenet, systemport, and DSA.  Also
      from Florian Fainelli.

   8) Add Mellanox Switch Driver and core infrastructure, from Jiri
      Pirko.

   9) Add support for "light weight tunnels", which allow for
      encapsulation and decapsulation without bearing the overhead of a
      full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
      others.

  10) Add Identifier Locator Addressing support for ipv6, from Tom
      Herbert.

  11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

  12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

  13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

  14) Stop using a zero TX queue length to mean that a device shouldn't
      have a qdisc attached, use an explicit flag instead.  From Phil
      Sutter.

  15) Use generic geneve netdevice infrastructure in openvswitch, from
      Pravin B Shelar.

  16) Add infrastructure to avoid re-forwarding a packet in software
      that was already forwarded by a hardware switch.  From Scott
      Feldman.

  17) Allow AF_PACKET fanout function to be implemented in a bpf
      program, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
  netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
  netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
  net: fec: clear receive interrupts before processing a packet
  ipv6: fix exthdrs offload registration in out_rt path
  xen-netback: add support for multicast control
  bgmac: Update fixed_phy_register()
  sock, diag: fix panic in sock_diag_put_filterinfo
  flow_dissector: Use 'const' where possible.
  flow_dissector: Fix function argument ordering dependency
  ixgbe: Resolve "initialized field overwritten" warnings
  ixgbe: Remove bimodal SR-IOV disabling
  ixgbe: Add support for reporting 2.5G link speed
  ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
  ixgbe: support for ethtool set_rxfh
  ixgbe: Avoid needless PHY access on copper phys
  ixgbe: cleanup to use cached mask value
  ixgbe: Remove second instance of lan_id variable
  ixgbe: use kzalloc for allocating one thing
  flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
  ixgbe: Remove unused PCI bus types
  ...
2015-09-03 08:08:17 -07:00
Linus Torvalds
17e6b00ac4 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "This updated pull request does not contain the last few GIC related
  patches which were reported to cause a regression.  There is a fix
  available, but I let it breed for a couple of days first.

  The irq departement provides:

   - new infrastructure to support non PCI based MSI interrupts
   - a couple of new irq chip drivers
   - the usual pile of fixlets and updates to irq chip drivers
   - preparatory changes for removal of the irq argument from interrupt
     flow handlers
   - preparatory changes to remove IRQF_VALID"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources
  irqchip: Add bcm2836 interrupt controller for Raspberry Pi 2
  irqchip: Add documentation for the bcm2836 interrupt controller
  irqchip/bcm2835: Add support for being used as a second level controller
  irqchip/bcm2835: Refactor handle_IRQ() calls out of MAKE_HWIRQ
  PCI: xilinx: Fix typo in function name
  irqchip/gic: Ensure gic_cpu_if_up/down() programs correct GIC instance
  irqchip/gic: Only allow the primary GIC to set the CPU map
  PCI/MSI: pci-xgene-msi: Consolidate chained IRQ handler install/remove
  unicore32/irq: Prepare puv3_gpio_handler for irq argument removal
  tile/pci_gx: Prepare trio_handle_level_irq for irq argument removal
  m68k/irq: Prepare irq handlers for irq argument removal
  C6X/megamode-pic: Prepare megamod_irq_cascade for irq argument removal
  blackfin: Prepare irq handlers for irq argument removal
  arc/irq: Prepare idu_cascade_isr for irq argument removal
  sparc/irq: Use access helper irq_data_get_affinity_mask()
  sparc/irq: Use helper irq_data_get_irq_handler_data()
  parisc/irq: Use access helper irq_data_get_affinity_mask()
  mn10300/irq: Use access helper irq_data_get_affinity_mask()
  irqchip/i8259: Prepare i8259_irq_dispatch for irq argument removal
  ...
2015-09-01 14:33:35 -07:00
Linus Torvalds
5e359bf221 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Rather large, but nothing exiting:

   - new range check for settimeofday() to prevent that boot time
     becomes negative.
   - fix for file time rounding
   - a few simplifications of the hrtimer code
   - fix for the proc/timerlist code so the output of clock realtime
     timers is accurate
   - more y2038 work
   - tree wide conversion of clockevent drivers to the new callbacks"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits)
  hrtimer: Handle failure of tick_init_highres() gracefully
  hrtimer: Unconfuse switch_hrtimer_base() a bit
  hrtimer: Simplify get_target_base() by returning current base
  hrtimer: Drop return code of hrtimer_switch_to_hres()
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  time: Introduce current_kernel_time64()
  time: Introduce struct itimerspec64
  time: Add the common weak version of update_persistent_clock()
  time: Always make sure wall_to_monotonic isn't positive
  time: Fix nanosecond file time rounding in timespec_trunc()
  timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
  cris/time: Migrate to new 'set-state' interface
  kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
  xtensa/time: Migrate to new 'set-state' interface
  unicore/time: Migrate to new 'set-state' interface
  um/time: Migrate to new 'set-state' interface
  sparc/time: Migrate to new 'set-state' interface
  sh/localtimer: Migrate to new 'set-state' interface
  score/time: Migrate to new 'set-state' interface
  s390/time: Migrate to new 'set-state' interface
  ...
2015-09-01 14:04:50 -07:00
Linus Torvalds
25525bea46 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The dominant change in this cycle was the continued work to isolate
  kernel drivers from MTRR legacies: this tree gets rid of all kernel
  internal driver interfaces to MTRRs (mostly by rewriting it to proper
  PAT interfaces), the only access left is the /proc/mtrr ABI.

  This work was done by Luis R Rodriguez.

  There's also some related PCI interface additions for which I've
  Cc:-ed Bjorn"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
  s390/io: Add pci_iomap_wc() and pci_iomap_wc_range()
  drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style
  drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc()
  PCI: Add pci_iomap_wc() variants
  drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer
  drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  PCI: Add pci_ioremap_wc_bar()
  x86/mm: Make kernel/check.c explicitly non-modular
  x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular
  x86/mm/pat: Add comments to cachemode translation tables
  arch/*/io.h: Add ioremap_uc() to all architectures
  drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc()
  drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC
  drivers/video/fbdev/atyfb: Clarify ioremap() base and length used
  drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper
  x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default
  ...
2015-09-01 10:07:40 -07:00