Commit graph

642388 commits

Author SHA1 Message Date
Chenbo Feng
3eb88807b2 bpf: skip unnecessary capability check
commit 0fa4fe85f4 upstream.

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:26 +02:00
Daniel Borkmann
733a4e1af8 kbuild: disable clang's default use of -fmerge-all-constants
commit 87e0d4f0f3 upstream.

Prasad reported that he has seen crashes in BPF subsystem with netd
on Android with arm64 in the form of (note, the taint is unrelated):

  [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001
  [ 4134.820925] Mem abort info:
  [ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
  [ 4135.016736]   SET = 0, FnV = 0
  [ 4135.119820]   EA = 0, S1PTW = 0
  [ 4135.201431] Data abort info:
  [ 4135.301388]   ISV = 0, ISS = 0x00000021
  [ 4135.359599]   CM = 0, WnR = 0
  [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000
  [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000
  [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP
  [ 4135.674610] Modules linked in:
  [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S      W       4.14.19+ #1
  [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000
  [ 4135.731599] PC is at bpf_prog_add+0x20/0x68
  [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
  [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145
  [ 4135.769062] sp : ffffff801d4e3ce0
  [...]
  [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000)
  [ 4136.273746] Call trace:
  [...]
  [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006
  [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584
  [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68
  [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c
  [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c
  [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88

Android's netd was basically pinning the uid cookie BPF map in BPF
fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
again resulting in above panic. Issue is that the map was wrongly
identified as a prog! Above kernel was compiled with clang 4.0,
and it turns out that clang decided to merge the bpf_prog_iops and
bpf_map_iops into a single memory location, such that the two i_ops
could then not be distinguished anymore.

Reason for this miscompilation is that clang has the more aggressive
-fmerge-all-constants enabled by default. In fact, clang source code
has a comment about it in lib/AST/ExprConstant.cpp on why it is okay
to do so:

  Pointers with different bases cannot represent the same object.
  (Note that clang defaults to -fmerge-all-constants, which can
  lead to inconsistent results for comparisons involving the address
  of a constant; this generally doesn't matter in practice.)

The issue never appeared with gcc however, since gcc does not enable
-fmerge-all-constants by default and even *explicitly* states in
it's option description that using this flag results in non-conforming
behavior, quote from man gcc:

  Languages like C or C++ require each variable, including multiple
  instances of the same variable in recursive calls, to have distinct
  locations, so using this option results in non-conforming behavior.

There are also various clang bug reports open on that matter [1],
where clang developers acknowledge the non-conforming behavior,
and refer to disabling it with -fno-merge-all-constants. But even
if this gets fixed in clang today, there are already users out there
that triggered this. Thus, fix this issue by explicitly adding
-fno-merge-all-constants to the kernel's Makefile to generically
disable this optimization, since potentially other places in the
kernel could subtly break as well.

Note, there is also a flag called -fmerge-constants (not supported
by clang), which is more conservative and only applies to strings
and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In
gcc's code, the two flags -fmerge-{all-,}constants share the same
variable internally, so when disabling it via -fno-merge-all-constants,
then we really don't merge any const data (e.g. strings), and text
size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o).

  $ gcc -fverbose-asm -O2 foo.c -S -o foo.S
    -> foo.S lists -fmerge-constants under options enabled
  $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S
    -> foo.S doesn't list -fmerge-constants under options enabled
  $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S
    -> foo.S lists -fmerge-constants under options enabled

Thus, as a workaround we need to set both -fno-merge-all-constants
*and* -fmerge-constants in the Makefile in order for text size to
stay as is.

  [1] https://bugs.llvm.org/show_bug.cgi?id=18538

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chenbo Feng <fengc@google.com>
Cc: Richard Smith <richard-llvm@metafoo.co.uk>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: linux-kernel@vger.kernel.org
Tested-by: Prasad Sodagudi <psodagud@codeaurora.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:26 +02:00
Shuah Khan
353f71fe3c selftests: x86: sysret_ss_attrs doesn't build on a PIE build
commit 3346a6a4e5 upstream.

sysret_ss_attrs fails to compile leading x86 test run to fail on systems
configured to build using PIE by default. Add -no-pie fix it.

Relocation might still fail if relocated above 4G. For now this change
fixes the build and runs x86 tests.

tools/testing/selftests/x86$ make
gcc -m64 -o .../tools/testing/selftests/x86/single_step_syscall_64 -O2
-g -std=gnu99 -pthread -Wall  single_step_syscall.c -lrt -ldl
gcc -m64 -o .../tools/testing/selftests/x86/sysret_ss_attrs_64 -O2 -g
-std=gnu99 -pthread -Wall  sysret_ss_attrs.c thunks.S -lrt -ldl
/usr/bin/ld: /tmp/ccS6pvIh.o: relocation R_X86_64_32S against `.text'
can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Makefile:49: recipe for target
'.../tools/testing/selftests/x86/sysret_ss_attrs_64' failed
make: *** [.../tools/testing/selftests/x86/sysret_ss_attrs_64] Error 1

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Dave Hansen
1443abc903 x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
commit 91c49c2deb upstream.

'si_pkey' is now #defined to be the name of the new siginfo field that
protection keys uses.  Rename it not to conflict.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171111001231.DFFC8285@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Eric W. Biederman
f41f8156ae signal/testing: Don't look for __SI_FAULT in userspace
commit d12fe87e62 upstream.

Fix the debug print statements in these tests where they reference
si_codes and in particular __SI_FAULT.  __SI_FAULT is a kernel
internal value and should never be seen by userspace.

While I am in there also fix si_code_str.  si_codes are an enumeration
there are not a bitmap so == and not & is the apropriate operation to
test for an si_code.

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 5f23f6d082 ("x86/pkeys: Add self-tests")
Fixes: e754aedc26 ("x86/mpx, selftests: Add MPX self test")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Andy Lutomirski
93b4839239 selftests/x86/protection_keys: Fix syscall NR redefinition warnings
commit 693cb5580f upstream.

On new enough glibc, the pkey syscalls numbers are available.  Check
first before defining them to avoid warnings like:

protection_keys.c:198:0: warning: "SYS_pkey_alloc" redefined

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1fbef53a9e6befb7165ff855fc1a7d4788a191d6.1509794321.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Dave Hansen
26e9852f9d selftests, x86, protection_keys: fix wrong offset in siginfo
commit 2195bff041 upstream.

The siginfo contains a bunch of information about the fault.
For protection keys, it tells us which protection key's
permissions were violated.

The wrong offset in here leads to reading garbage and thus
failures in the tests.

We should probably eventually move this over to using the
kernel's headers defining the siginfo instead of a hard-coded
offset.  But, for now, just do the simplest fix.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Nadav Amit
1e0fc7dba2 staging: lustre: ptlrpc: kfree used instead of kvfree
commit c3eec59659 upstream.

rq_reqbuf is allocated using kvmalloc() but released in one occasion
using kfree() instead of kvfree().

The issue was found using grep based on a similar bug.

Fixes: d7e09d0397 ("add Lustre file system client support")
Fixes: ee0ec1946e ("lustre: ptlrpc: Replace uses of OBD_{ALLOC,FREE}_LARGE")

Cc: Peng Tao <bergwolf@gmail.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: James Simmons <jsimmons@infradead.org>

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Linus Walleij
162daa2714 iio: ABI: Fix name of timestamp sysfs file
commit b9a3589332 upstream.

The name of the file is "current_timetamp_clock" not
"timestamp_clock".

Fixes: bc2b7dab62 ("iio:core: timestamping clock selection support")
Cc: Gregor Boirie <gregor.boirie@parrot.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Kan Liang
9c0d0a0c79 perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers
commit 320b0651f3 upstream.

The number of CHAs is miscalculated on multi-domain PCI Skylake server systems,
resulting in an uncore driver initialization error.

Gary Kroening explains:

 "For systems with a single PCI segment, it is sufficient to look for the
  bus number to change in order to determine that all of the CHa's have
  been counted for a single socket.

  However, for multi PCI segment systems, each socket is given a new
  segment and the bus number does NOT change.  So looking only for the
  bus number to change ends up counting all of the CHa's on all sockets
  in the system.  This leads to writing CPU MSRs beyond a valid range and
  causes an error in ivbep_uncore_msr_init_box()."

To fix this bug, query the number of CHAs from the CAPID6 register:
it should read bits 27:0 in the CAPID6 register located at
Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector
of available LLC slices and the CHAs that manage those slices.

Reported-by: Kroening, Gary <gary.kroening@hpe.com>
Tested-by: Kroening, Gary <gary.kroening@hpe.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: abanman@hpe.com
Cc: dimitri.sivanich@hpe.com
Cc: hpa@zytor.com
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Link: http://lkml.kernel.org/r/1520967094-13219-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Dan Carpenter
e91ec34941 perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
commit e5ea9b54a0 upstream.

We intended to clear the lowest 6 bits but because of a type bug we
clear the high 32 bits as well.  Andi says that periods are rarely more
than U32_MAX so this bug probably doesn't have a huge runtime impact.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 294fe0f52a ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:25 +02:00
Ilya Pronin
a8b3a6a4ae perf stat: Fix CVS output format for non-supported counters
commit 40c21898ba upstream.

When printing stats in CSV mode, 'perf stat' appends extra separators
when a counter is not supported:

<not supported>,,L1-dcache-store-misses,mesos/bd442f34-2b4a-47df-b966-9b281f9f56fc,0,100.00,,,,

Which causes a failure when parsing fields. The numbers of separators
should be the same for each line, no matter if the counter is or not
supported.

Signed-off-by: Ilya Pronin <ipronin@twitter.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20180306064353.31930-1-xiyou.wangcong@gmail.com
Fixes: 92a61f6412 ("perf stat: Implement CSV metrics output")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Kan Liang
b3076abb67 perf/x86/intel/uncore: Fix Skylake UPI event format
commit 317660940f upstream.

There is no event extension (bit 21) for SKX UPI, so
use 'event' instead of 'event_ext'.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Link: http://lkml.kernel.org/r/1520004150-4855-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Andy Lutomirski
3681c24a7d x86/entry/64: Don't use IST entry for #BP stack
commit d8ba61ba58 upstream.

There's nothing IST-worthy about #BP/int3.  We don't allow kprobes
in the small handful of places in the kernel that run at CPL0 with
an invalid stack, and 32-bit kernels have used normal interrupt
gates for #BP forever.

Furthermore, we don't allow kprobes in places that have usergs while
in kernel mode, so "paranoid" is also unnecessary.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
H.J. Lu
f9ed244572 x86/boot/64: Verify alignment of the LOAD segment
commit c55b8550fa upstream.

Since the x86-64 kernel must be aligned to 2MB, refuse to boot the
kernel if the alignment of the LOAD segment isn't a multiple of 2MB.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
H.J. Lu
678b405bff x86/build/64: Force the linker to use 2MB page size
commit e3d03598e8 upstream.

Binutils 2.31 will enable -z separate-code by default for x86 to avoid
mixing code pages with data to improve cache performance as well as
security.  To reduce x86-64 executable and shared object sizes, the
maximum page size is reduced from 2MB to 4KB.  But x86-64 kernel must
be aligned to 2MB.  Pass -z max-page-size=0x200000 to linker to force
2MB page size regardless of the default page size used by linker.

Tested with Linux kernel 4.15.6 on x86-64.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOp4_%3D_8twdpTyAP2DhONOCeaTOsniJLoppzhoNptL8xzA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Linus Torvalds
587da2b628 kvm/x86: fix icebp instruction handling
commit 32d43cd391 upstream.

The undocumented 'icebp' instruction (aka 'int1') works pretty much like
'int3' in the absense of in-circuit probing equipment (except,
obviously, that it raises #DB instead of raising #BP), and is used by
some validation test-suites as such.

But Andy Lutomirski noticed that his test suite acted differently in kvm
than on bare hardware.

The reason is that kvm used an inexact test for the icebp instruction:
it just assumed that an all-zero VM exit qualification value meant that
the VM exit was due to icebp.

That is not unlike the guess that do_debug() does for the actual
exception handling case, but it's purely a heuristic, not an absolute
rule.  do_debug() does it because it wants to ascribe _some_ reasons to
the #DB that happened, and an empty %dr6 value means that 'icebp' is the
most likely casue and we have no better information.

But kvm can just do it right, because unlike the do_debug() case, kvm
actually sees the real reason for the #DB in the VM-exit interruption
information field.

So instead of relying on an inexact heuristic, just use the actual VM
exit information that says "it was 'icebp'".

Right now the 'icebp' instruction isn't technically documented by Intel,
but that will hopefully change.  The special "privileged software
exception" information _is_ actually mentioned in the Intel SDM, even
though the cause of it isn't enumerated.

Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Andy Lutomirski
c68a7a87e1 selftests/x86/ptrace_syscall: Fix for yet more glibc interference
commit 4b0b37d4cc upstream.

glibc keeps getting cleverer, and my version now turns raise() into
more than one syscall.  Since the test relies on ptrace seeing an
exact set of syscalls, this breaks the test.  Replace raise(SIGSTOP)
with syscall(SYS_tgkill, ...) to force glibc to get out of our way.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/bc80338b453afa187bc5f895bd8e2c8d6e264da2.1521300271.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Linus Torvalds
7c28067736 tty: vt: fix up tabstops properly
commit f1869a890c upstream.

Tabs on a console with long lines do not wrap properly, so correctly
account for the line length when computing the tab placement location.

Reported-by: James Holderness <j4_james@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Andri Yngvason
101a72edd9 can: cc770: Fix use after free in cc770_tx_interrupt()
commit 9ffd750394 upstream.

This fixes use after free introduced by the last cc770 patch.

Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Fixes: 746201235b ("can: cc770: Fix queue stall & dropped RTR reply")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:24 +02:00
Andri Yngvason
5fdbcc3d6d can: cc770: Fix queue stall & dropped RTR reply
commit 746201235b upstream.

While waiting for the TX object to send an RTR, an external message with a
matching id can overwrite the TX data. In this case we must call the rx
routine and then try transmitting the message that was overwritten again.

The queue was being stalled because the RX event did not generate an
interrupt to wake up the queue again and the TX event did not happen
because the TXRQST flag is reset by the chip when new data is received.

According to the CC770 datasheet the id of a message object should not be
changed while the MSGVAL bit is set. This has been fixed by resetting the
MSGVAL bit before modifying the object in the transmit function and setting
it after. It is not enough to set & reset CPUUPD.

It is important to keep the MSGVAL bit reset while the message object is
being modified. Otherwise, during RTR transmission, a frame with matching
id could trigger an rx-interrupt, which would cause a race condition
between the interrupt routine and the transmit function.

Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Tested-by: Richard Weinberger <richard@nod.at>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Andri Yngvason
01a303d27a can: cc770: Fix stalls on rt-linux, remove redundant IRQ ack
commit f4353daf49 upstream.

This has been reported to cause stalls on rt-linux.

Suggested-by: Richard Weinberger <richard@nod.at>
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Marek Vasut
6e400b460a can: ifi: Check core revision upon probe
commit 591d65d5b1 upstream.

Older versions of the core are not compatible with the driver due
to various intrusive fixes of the core. Read out the VER register,
check the core revision bitfield and verify if the core in use is
new enough (rev 2.1 or newer) to work correctly with this driver.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Heiko Schocher <hs@denx.de>
Cc: Markus Marb <markus@marb.org>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Marek Vasut
a452b356c5 can: ifi: Repair the error handling
commit 880dd464b4 upstream.

The new version of the IFI CANFD core has significantly less complex
error state indication logic. In particular, the warning/error state
bits are no longer all over the place, but are all present in the
STATUS register. Moreover, there is a new IRQ register bit indicating
transition between error states (active/warning/passive/busoff).

This patch makes use of this bit to weed out the obscure selective
INTERRUPT register clearing, which was used to carry over the error
state indication into the poll function. While at it, this patch
fixes the handling of the ACTIVE state, since the hardware provides
indication of the core being in ACTIVE state and that in turn fixes
the state transition indication toward userspace. Finally, register
reads in the poll function are moved to the matching subfunctions
since those are also no longer needed in the poll function.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Heiko Schocher <hs@denx.de>
Cc: Markus Marb <markus@marb.org>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Dan Carpenter
5e7124c4d6 staging: ncpfs: memory corruption in ncp_read_kernel()
commit 4c41aa24ba upstream.

If the server is malicious then *bytes_read could be larger than the
size of the "target" buffer.  It would lead to memory corruption when we
do the memcpy().

Reported-by: Dr Silvio Cesare of InfoSect <Silvio Cesare <silvio.cesare@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Jagdish Gediya
4d9ed68855 mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0
commit 6b00c35138 upstream.

Due to missing information in Hardware manual, current
implementation doesn't read ECCSTAT0 and ECCSTAT1 registers
for IFC 2.0.

Add support to read ECCSTAT0 and ECCSTAT1 registers during
ecccheck for IFC 2.0.

Fixes: 656441478e ("mtd: nand: ifc: Fix location of eccstat registers for IFC V1.0")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com>
Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Jagdish Gediya
eca95cb6b4 mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0
commit 843c3a5999 upstream.

Number of ECC status registers i.e. (ECCSTATx) has been increased in IFC
version 2.0.0 due to increase in SRAM size. This is causing eccstat
array to over flow.

So, replace eccstat array with u32 variable to make it fail-safe and
independent of number of ECC status registers or SRAM size.

Fixes: bccb06c353 ("mtd: nand: ifc: update bufnum mask for ver >= 2.0.0")
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Jagdish Gediya
9b5dd84950 mtd: nand: fsl_ifc: Fix nand waitfunc return value
commit fa8e6d58c5 upstream.

As per the IFC hardware manual, Most significant 2 bytes in
nand_fsr register are the outcome of NAND READ STATUS command.

So status value need to be shifted and aligned as per the nand
framework requirement.

Fixes: 82771882d9 ("NAND Machine support for Integrated Flash Controller")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com>
Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
OuYang ZhiZhong
9eab80eded mtdchar: fix usage of mtd_ooblayout_ecc()
commit 6de564939e upstream.

Section was not properly computed. The value of OOB region definition is
always ECC section 0 information in the OOB area, but we want to get all
the ECC bytes information, so we should call
mtd_ooblayout_ecc(mtd, section++, &oobregion) until it returns -ERANGE.

Fixes: c2b78452a9 ("mtd: use mtd_ooblayout_xxx() helpers where appropriate")
Cc: <stable@vger.kernel.org>
Signed-off-by: OuYang ZhiZhong <ouyzz@yealink.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Masami Hiramatsu
d434dae761 tracing: probeevent: Fix to support minus offset from symbol
commit c5d343b6b7 upstream.

In Documentation/trace/kprobetrace.txt, it says

 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)

However, the parser doesn't parse minus offset correctly, since
commit 2fba0c8867 ("tracing/kprobes: Fix probe offset to be
unsigned") drops minus ("-") offset support for kprobe probe
address usage.

This fixes the traceprobe_split_symbol_offset() to parse minus
offset again with checking the offset range, and add a minus
offset check in kprobe probe address usage.

Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 2fba0c8867 ("tracing/kprobes: Fix probe offset to be unsigned")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:23 +02:00
Larry Finger
0e17fddb64 rtlwifi: rtl8723be: Fix loss of signal
commit 78dc897b7e upstream.

In commit c713fb071e ("rtlwifi: rtl8821ae: Fix connection lost problem
correctly") a problem in rtl8821ae that caused loss of signal was fixed.
That same problem has now been reported for rtl8723be. Accordingly,
the ASPM L1 latency has been increased from 0 to 7 to fix the instability.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
Tested-by: James Cameron <quozl@laptop.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Arend Van Spriel
0a9be1b132 brcmfmac: fix P2P_DEVICE ethernet address generation
commit 455f3e76cf upstream.

The firmware has a requirement that the P2P_DEVICE address should
be different from the address of the primary interface. When not
specified by user-space, the driver generates the MAC address for
the P2P_DEVICE interface using the MAC address of the primary
interface and setting the locally administered bit. However, the MAC
address of the primary interface may already have that bit set causing
the creation of the P2P_DEVICE interface to fail with -EBUSY. Fix this
by using a random address instead to determine the P2P_DEVICE address.

Cc: stable@vger.kernel.org # 3.10.y
Reported-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Vishal Verma
6fa877d2ac libnvdimm, {btt, blk}: do integrity setup before add_disk()
commit 3ffb0ba9b5 upstream.

Prior to 25520d55cd ("block: Inline blk_integrity in struct gendisk")
we needed to temporarily add a zero-capacity disk before registering for
blk-integrity. But adding a zero-capacity disk caused the partition
table scanning to bail early, and this resulted in partitions not coming
up after a probe of the BTT or blk namespaces.

We can now register for integrity before the disk has been added, and
this fixes the rescan problems.

Fixes: 25520d55cd ("block: Inline blk_integrity in struct gendisk")
Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Takashi Iwai
d0826ba87d ACPI / watchdog: Fix off-by-one error at resource assignment
commit b1abf6fc49 upstream.

The resource allocation in WDAT watchdog has off-one-by error, it sets
one byte more than the actual end address.  This may eventually lead
to unexpected resource conflicts.

Fixes: 058dfc7670 (ACPI / watchdog: Add support for WDAT hardware watchdog)
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Dan Williams
f33db316d0 acpi, numa: fix pxm to online numa node associations
commit dc9e0a9347 upstream.

Commit 99759869fa "acpi: Add acpi_map_pxm_to_online_node()" added
support for mapping a given proximity to its nearest, by SLIT distance,
online node. However, it sometimes returns unexpected results due to the
fact that it switches from comparing the PXM node to the last node that
was closer than the current max.

    for_each_online_node(n) {
            dist = node_distance(node, n);
            if (dist < min_dist) {
                    min_dist = dist;
                    node = n;	<---- from this point we're using the
				      wrong node for node_distance()


Fixes: 99759869fa ("acpi: Add acpi_map_pxm_to_online_node()")
Cc: <stable@vger.kernel.org>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Greg Kroah-Hartman
4ac9ab4f5f drm: udl: Properly check framebuffer mmap offsets
commit 3b82a4db8e upstream.

The memmap options sent to the udl framebuffer driver were not being
checked for all sets of possible crazy values.  Fix this up by properly
bounding the allowed values.

Reported-by: Eyal Itkin <eyalit@checkpoint.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180321154553.GA18454@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Michel Dänzer
e664e6d663 drm/radeon: Don't turn off DP sink when disconnected
commit 2681bc79ee upstream.

Turning off the sink in this case causes various issues, because
userspace expects it to stay on until it turns it off explicitly.

Instead, turn the sink off and back on when a display is connected
again. This dance seems necessary for link training to work correctly.

Bugzilla: https://bugs.freedesktop.org/105308
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Thomas Hellstrom
f0c88241d3 drm/vmwgfx: Fix a destoy-while-held mutex problem.
commit 73a88250b7 upstream.

When validating legacy surfaces, the backup bo might be destroyed at
surface validate time. However, the kms resource validation code may have
the bo reserved, so we will destroy a locked mutex. While there shouldn't
be any other users of that mutex when it is destroyed, it causes a lock
leak and thus throws a lockdep error.

Fix this by having the kms resource validation code hold a reference to
the bo while we have it reserved. We do this by introducing a validation
context which might come in handy when the kms code is extended to validate
multiple resources or buffers.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Kirill A. Shutemov
8ab899550b mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
commit b3cd54b257 upstream.

shmem_unused_huge_shrink() gets called from reclaim path.  Waiting for
page lock may lead to deadlock there.

There was a bug report that may be attributed to this:

  http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.net

Replace lock_page() with trylock_page() and skip the page if we failed
to lock it.  We will get to the page on the next scan.

We can test for the PageTransHuge() outside the page lock as we only
need protection against splitting the page under us.  Holding pin oni
the page is enough for this.

Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel.com
Fixes: 779750d20b ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Eric Wheeler <linux-mm@lists.ewheeler.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:22 +02:00
Kirill A. Shutemov
142d9dda9e mm/thp: do not wait for lock_page() in deferred_split_scan()
commit fa41b900c3 upstream.

deferred_split_scan() gets called from reclaim path.  Waiting for page
lock may lead to deadlock there.

Replace lock_page() with trylock_page() and skip the page if we failed
to lock it.  We will get to the page on the next scan.

Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel.com
Fixes: 9a982250f7 ("thp: introduce deferred_split_huge_page()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Kirill A. Shutemov
24284d5f53 mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
commit fece2029a9 upstream.

khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
mapped.  We do not collapse such pages.  See check
khugepaged_scan_pmd().

But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
somebody managed to instantiate THP in the range and then split the PMD
back to PTEs we would have a problem --
VM_BUG_ON_PAGE(PageCompound(page)) will get triggered.

It's possible since we drop mmap_sem during collapse to re-take for
write.

Replace the VM_BUG_ON() with graceful collapse fail.

Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel.com
Fixes: b1caa957ae ("khugepaged: ignore pmd tables with THP mapped with ptes")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Toshi Kani
f4fe4f987a x86/mm: implement free pmd/pte page interfaces
commit 28ee90fe60 upstream.

Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).

The address range associated with the pud/pmd entry must have been
purged by INVLPG.

Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com
Fixes: e61ce6ade4 ("mm: change ioremap to set up huge I/O mappings")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Toshi Kani
9c7f7bdb19 mm/vmalloc: add interfaces to free unmapped page table
commit b6bdb7517c upstream.

On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
create pud/pmd mappings.  A kernel panic was observed on arm64 systems
with Cortex-A75 in the following steps as described by Hanjun Guo.

 1. ioremap a 4K size, valid page table will build,
 2. iounmap it, pte0 will set to 0;
 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
    then set the a new value for pmd;
 4. pte0 is leaked;
 5. CPU may meet exception because the old pmd is still in TLB,
    which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.  x86
still has memory leak.

The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:

 - The iounmap() path is shared with vunmap(). Since vmap() only
   supports pte mappings, making vunmap() to free a pte page is an
   overhead for regular vmap users as they do not need a pte page freed
   up.

 - Checking if all entries in a pte page are cleared in the unmap path
   is racy, and serializing this check is expensive.

 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
   Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
   purge.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
clear a given pud/pmd entry and free up a page for the lower level
entries.

This patch implements their stub functions on x86 and arm64, which work
as workaround.

[akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
Fixes: e61ce6ade4 ("mm: change ioremap to set up huge I/O mappings")
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Jeff Layton
e294c4c2d3 nfsd: remove blocked locks on client teardown
commit 68ef3bc316 upstream.

We had some reports of panics in nfsd4_lm_notify, and that showed a
nfs4_lockowner that had outlived its so_client.

Ensure that we walk any leftover lockowners after tearing down all of
the stateids, and remove any blocked locks that they hold.

With this change, we also don't need to walk the nbl_lru on nfsd_net
shutdown, as that will happen naturally when we tear down the clients.

Fixes: 76d348fadf (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks)
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Hans de Goede
8d7a2a6d45 libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
commit d418ff56b8 upstream.

When commit 9c7be59fc5 ("libata: Apply NOLPM quirk to Crucial MX100
512GB SSDs") was added it inherited the ATA_HORKAGE_NO_NCQ_TRIM quirk
from the existing "Crucial_CT*MX100*" entry, but that entry sets model_rev
to "MU01", where as the entry adding the NOLPM quirk sets it to NULL.

This means that after this commit we no apply the NO_NCQ_TRIM quirk to
all "Crucial_CT512MX100*" SSDs even if they have the fixed "MU02"
firmware. This commit splits the "Crucial_CT512MX100*" quirk into 2
quirks, one for the "MU01" firmware and one for all other firmware
versions, so that we once again only apply the NO_NCQ_TRIM quirk to the
"MU01" firmware version.

Fixes: 9c7be59fc5 ("libata: Apply NOLPM quirk to ... MX100 512GB SSDs")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Hans de Goede
ed57941c84 libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
commit 3bf7b5d6d0 upstream.

Commit b17e5729a6 ("libata: disable LPM for Crucial BX100 SSD 500GB
drive"), introduced a ATA_HORKAGE_NOLPM quirk for Crucial BX100 500GB SSDs
but limited this to the MU02 firmware version, according to:
http://www.crucial.com/usa/en/support-ssd-firmware

MU02 is the last version, so there are no newer possibly fixed versions
and if the MU02 version has broken LPM then the MU01 almost certainly
also has broken LPM, so this commit changes the quirk to apply to all
firmware versions.

Fixes: b17e5729a6 ("libata: disable LPM for Crucial BX100 SSD 500GB...")
Cc: stable@vger.kernel.org
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Hans de Goede
a7262d24ab libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
commit 62ac3f7305 upstream.

There have been reports of the Crucial M500 480GB model not working
with LPM set to min_power / med_power_with_dipm level.

It has not been tested with medium_power, but that typically has no
measurable power-savings.

Note the reporters Crucial_CT480M500SSD3 has a firmware version of MU03
and there is a MU05 update available, but that update does not mention any
LPM fixes in its changelog, so the quirk matches all firmware versions.

In my experience the LPM problems with (older) Crucial SSDs seem to be
limited to higher capacity versions of the SSDs (different firmware?),
so this commit adds a NOLPM quirk for the 480 and 960GB versions of the
M500, to avoid LPM causing issues with these SSDs.

Cc: stable@vger.kernel.org
Reported-and-tested-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:21 +02:00
Ju Hyung Park
db4a121a89 libata: Enable queued TRIM for Samsung SSD 860
commit ca6bfcb2f6 upstream.

Samsung explicitly states that queued TRIM is supported for Linux with
860 PRO and 860 EVO.

Make the previous blacklist to cover only 840 and 850 series.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:20 +02:00
Kai-Heng Feng
2bcfcae486 libata: disable LPM for Crucial BX100 SSD 500GB drive
commit b17e5729a6 upstream.

After Laptop Mode Tools starts to use min_power for LPM, a user found
out Crucial BX100 SSD can't get mounted.

Crucial BX100 SSD 500GB drive don't work well with min_power. This also
happens to med_power_with_dipm.

So let's disable LPM for Crucial BX100 SSD 500GB drive.

BugLink: https://bugs.launchpad.net/bugs/1726930
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:20 +02:00
Hans de Goede
a9f062b850 libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
commit 9c7be59fc5 upstream.

Various people have reported the Crucial MX100 512GB model not working
with LPM set to min_power. I've now received a report that it also does
not work with the new med_power_with_dipm level.

It does work with medium_power, but that has no measurable power-savings
and given the amount of people being bitten by the other levels not
working, this commit just disables LPM altogether.

Note all reporters of this have either the 512GB model (max capacity), or
are not specifying their SSD's size. So for now this quirk assumes this is
a problem with the 512GB model only.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=89261
Buglink: https://github.com/linrunner/TLP/issues/84
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:39:20 +02:00