Commit graph

2908 commits

Author SHA1 Message Date
Mark Rutland
59d2dc7710 arm64: ftrace: add missing BTIs
commit 35b6b28e69 upstream.

When branch target identifiers are in use, code reachable via an
indirect branch requires a BTI landing pad at the branch target site.

When building FTRACE_WITH_REGS atop patchable-function-entry, we miss
BTIs at the start start of the `ftrace_caller` and `ftrace_regs_caller`
trampolines, and when these are called from a module via a PLT (which
will use a `BR X16`), we will encounter a BTI failure, e.g.

| # insmod lkdtm.ko
| lkdtm: No crash points registered, enable through debugfs
| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # cat /sys/kernel/debug/provoke-crash/DIRECT
| Unhandled 64-bit el1h sync exception on CPU0, ESR 0x34000001 -- BTI
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400405 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=jc)
| pc : ftrace_caller+0x0/0x3c
| lr : lkdtm_debugfs_open+0xc/0x20 [lkdtm]
| sp : ffff800012e43b00
| x29: ffff800012e43b00 x28: 0000000000000000 x27: ffff800012e43c88
| x26: 0000000000000000 x25: 0000000000000000 x24: ffff0000c171f200
| x23: ffff0000c27b1e00 x22: ffff0000c2265240 x21: ffff0000c23c8c30
| x20: ffff8000090ba380 x19: 0000000000000000 x18: 0000000000000000
| x17: 0000000000000000 x16: ffff80001002bb4c x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000900ff0
| x11: ffff0000c4166310 x10: ffff800012e43b00 x9 : ffff8000104f2384
| x8 : 0000000000000001 x7 : 0000000000000000 x6 : 000000000000003f
| x5 : 0000000000000040 x4 : ffff800012e43af0 x3 : 0000000000000001
| x2 : ffff8000090b0000 x1 : ffff0000c171f200 x0 : ffff0000c23c8c30
| Kernel panic - not syncing: Unhandled exception
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1a4
|  show_stack+0x24/0x30
|  dump_stack_lvl+0x68/0x84
|  dump_stack+0x1c/0x38
|  panic+0x168/0x360
|  arm64_exit_nmi.isra.0+0x0/0x80
|  el1h_64_sync_handler+0x68/0xd4
|  el1h_64_sync+0x78/0x7c
|  ftrace_caller+0x0/0x3c
|  do_dentry_open+0x134/0x3b0
|  vfs_open+0x38/0x44
|  path_openat+0x89c/0xe40
|  do_filp_open+0x8c/0x13c
|  do_sys_openat2+0xbc/0x174
|  __arm64_sys_openat+0x6c/0xbc
|  invoke_syscall+0x50/0x120
|  el0_svc_common.constprop.0+0xdc/0x100
|  do_el0_svc+0x84/0xa0
|  el0_svc+0x28/0x80
|  el0t_64_sync_handler+0xa8/0x130
|  el0t_64_sync+0x1a0/0x1a4
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x0,00000f42,da660c5f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: Unhandled exception ]---

Fix this by adding the required `BTI C`, as we only require these to be
reachable via BL for direct calls or BR X16/X17 for PLTs. For now, these
are open-coded in the function prologue, matching the style of the
`__hwasan_tag_mismatch` trampoline.

In future we may wish to consider adding a new SYM_CODE_START_*()
variant which has an implicit BTI.

When ftrace is built atop mcount, the trampolines are marked with
SYM_FUNC_START(), and so get an implicit BTI. We may need to change
these over to SYM_CODE_START() in future for RELIABLE_STACKTRACE, in
case we need to apply special care aroud the return address being
rewritten.

Fixes: 97fed779f2 ("arm64: bti: Provide Kconfig for kernel mode BTI")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129135709.2274019-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08 09:03:24 +01:00
Nick Desaulniers
af1d3c437e arm64: vdso32: suppress error message for 'make mrproper'
commit 14831fad73 upstream.

When running the following command without arm-linux-gnueabi-gcc in
one's $PATH, the following warning is observed:

$ ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make -j72 LLVM=1 mrproper
make[1]: arm-linux-gnueabi-gcc: No such file or directory

This is because KCONFIG is not run for mrproper, so CONFIG_CC_IS_CLANG
is not set, and we end up eagerly evaluating various variables that try
to invoke CC_COMPAT.

This is a similar problem to what was observed in
commit dc960bfeed ("h8300: suppress error messages for 'make clean'")

Reported-by: Lucas Henneman <henneman@google.com>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20211019223646.1146945-4-ndesaulniers@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-26 10:39:18 +01:00
Dan Li
0a511ba6d2 arm64: Mark __stack_chk_guard as __ro_after_init
[ Upstream commit 9fcb2e93f4 ]

__stack_chk_guard is setup once while init stage and never changed
after that.

Although the modification of this variable at runtime will usually
cause the kernel to crash (so does the attacker), it should be marked
as __ro_after_init, and it should not affect performance if it is
placed in the ro_after_init section.

Signed-off-by: Dan Li <ashimida@linux.alibaba.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1631612642-102881-1-git-send-email-ashimida@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30 10:11:07 +02:00
Thomas Gleixner
b9a1526d51 drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
[ Upstream commit 4b92d4add5 ]

DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called on the upcoming CPU
because the notifier itself could run on any online CPU.

The hotplug state machine guarantees that the callbacks are invoked on the
upcoming CPU. So there is no need to have this SMP function call
obfuscation. That indirection was missed when the hotplug notifiers were
converted.

This also solves the problem of ARM64 init_cache_level() invoking ACPI
functions which take a semaphore in that context. That's invalid as SMP
function calls run with interrupts disabled. Running it just from the
callback in context of the CPU hotplug thread solves this.

Fixes: 8571890e15 ("arm64: Add support for ACPI based firmware tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-26 14:08:59 +02:00
Mark Brown
a67e7cdbc6 arm64/sve: Use correct size when reinitialising SVE state
commit e35ac9d0b5 upstream.

When we need a buffer for SVE register state we call sve_alloc() to make
sure that one is there. In order to avoid repeated allocations and frees
we keep the buffer around unless we change vector length and just memset()
it to ensure a clean register state. The function that deals with this
takes the task to operate on as an argument, however in the case where we
do a memset() we initialise using the SVE state size for the current task
rather than the task passed as an argument.

This is only an issue in the case where we are setting the register state
for a task via ptrace and the task being configured has a different vector
length to the task tracing it. In the case where the buffer is larger in
the traced process we will leak old state from the traced process to
itself, in the case where the buffer is smaller in the traced process we
will overflow the buffer and corrupt memory.

Fixes: bc0ee47603 ("arm64/sve: Core task context handling")
Cc: <stable@vger.kernel.org> # 4.15.x
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210909165356.10675-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:27:54 +02:00
Mark Rutland
de32e15180 arm64: head: avoid over-mapping in map_memory
commit 90268574a3 upstream.

The `compute_indices` and `populate_entries` macros operate on inclusive
bounds, and thus the `map_memory` macro which uses them also operates
on inclusive bounds.

We pass `_end` and `_idmap_text_end` to `map_memory`, but these are
exclusive bounds, and if one of these is sufficiently aligned (as a
result of kernel configuration, physical placement, and KASLR), then:

* In `compute_indices`, the computed `iend` will be in the page/block *after*
  the final byte of the intended mapping.

* In `populate_entries`, an unnecessary entry will be created at the end
  of each level of table. At the leaf level, this entry will map up to
  SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend
  to map.

As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may
violate the boot protocol and map physical address past the 2MiB-aligned
end address we are permitted to map. As we map these with Normal memory
attributes, this may result in further problems depending on what these
physical addresses correspond to.

The final entry at each level may require an additional table at that
level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate
enough memory for this.

Avoid the extraneous mapping by having map_memory convert the exclusive
end address to an inclusive end address by subtracting one, and do
likewise in EARLY_ENTRIES() when calculating the number of required
tables. For clarity, comments are updated to more clearly document which
boundaries the macros operate on.  For consistency with the other
macros, the comments in map_memory are also updated to describe `vstart`
and `vend` as virtual addresses.

Fixes: 0370b31e48 ("arm64: Extend early page table code to allow for larger kernels")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18 13:40:08 +02:00
Mark Rutland
3d7d1b0f5f arm64: fix compat syscall return truncation
commit e30e8d46cf upstream.

Due to inconsistencies in the way we manipulate compat GPRs, we have a
few issues today:

* For audit and tracing, where error codes are handled as a (native)
  long, negative error codes are expected to be sign-extended to the
  native 64-bits, or they may fail to be matched correctly. Thus a
  syscall which fails with an error may erroneously be identified as
  failing.

* For ptrace, *all* compat return values should be sign-extended for
  consistency with 32-bit arm, but we currently only do this for
  negative return codes.

* As we may transiently set the upper 32 bits of some compat GPRs while
  in the kernel, these can be sampled by perf, which is somewhat
  confusing. This means that where a syscall returns a pointer above 2G,
  this will be sign-extended, but will not be mistaken for an error as
  error codes are constrained to the inclusive range [-4096, -1] where
  no user pointer can exist.

To fix all of these, we must consistently use helpers to get/set the
compat GPRs, ensuring that we never write the upper 32 bits of the
return code, and always sign-extend when reading the return code.  This
patch does so, with the following changes:

* We re-organise syscall_get_return_value() to always sign-extend for
  compat tasks, and reimplement syscall_get_error() atop. We update
  syscall_trace_exit() to use syscall_get_return_value().

* We consistently use syscall_set_return_value() to set the return
  value, ensureing the upper 32 bits are never set unexpectedly.

* As the core audit code currently uses regs_return_value() rather than
  syscall_get_return_value(), we special-case this for
  compat_user_mode(regs) such that this will do the right thing. Going
  forward, we should try to move the core audit code over to
  syscall_get_return_value().

Cc: <stable@vger.kernel.org>
Reported-by: He Zhe <zhe.he@windriver.com>
Reported-by: weiyuchen <weiyuchen3@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210802104200.21390-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
[Mark: trivial conflict resolution for v5.10.y]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-12 13:22:20 +02:00
Mark Rutland
e5d8fd8709 arm64: stacktrace: avoid tracing arch_stack_walk()
commit 0c32706dac upstream.

When the function_graph tracer is in use, arch_stack_walk() may unwind
the stack incorrectly, erroneously reporting itself, missing the final
entry which is being traced, and reporting all traced entries between
these off-by-one from where they should be.

When ftrace hooks a function return, the original return address is
saved to the fgraph ret_stack, and the return address  in the LR (or the
function's frame record) is replaced with `return_to_handler`.

When arm64's unwinder encounter frames returning to `return_to_handler`,
it finds the associated original return address from the fgraph ret
stack, assuming the most recent `ret_to_hander` entry on the stack
corresponds to the most recent entry in the fgraph ret stack, and so on.

When arch_stack_walk() is used to dump the current task's stack, it
starts from the caller of arch_stack_walk(). However, arch_stack_walk()
can be traced, and so may push an entry on to the fgraph ret stack,
leaving the fgraph ret stack offset by one from the expected position.

This can be seen when dumping the stack via /proc/self/stack, where
enabling the graph tracer results in an unexpected
`stack_trace_save_tsk` entry at the start of the trace, and `el0_svc`
missing form the end of the trace.

This patch fixes this by marking arch_stack_walk() as notrace, as we do
for all other functions on the path to ftrace_graph_get_ret_stack().
While a few helper functions are not marked notrace, their calls/returns
are balanced, and will have no observable effect when examining the
fgraph ret stack.

It is possible for an exeption boundary to cause a similar offset if the
return address of the interrupted context was in the LR. Fixing those
cases will require some more substantial rework, and is left for
subsequent patches.

Before:

| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xc4/0x140
| [<0>] proc_single_show+0x6c/0x120
| [<0>] seq_read_iter+0x240/0x4e0
| [<0>] seq_read+0xe8/0x140
| [<0>] vfs_read+0xb8/0x1e4
| [<0>] ksys_read+0x74/0x100
| [<0>] __arm64_sys_read+0x28/0x3c
| [<0>] invoke_syscall+0x50/0x120
| [<0>] el0_svc_common.constprop.0+0xc4/0xd4
| [<0>] do_el0_svc+0x30/0x9c
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0t_64_sync_handler+0x1a8/0x1b0
| [<0>] el0t_64_sync+0x198/0x19c
| # echo function_graph > /sys/kernel/tracing/current_tracer
| # cat /proc/self/stack
| [<0>] stack_trace_save_tsk+0xa4/0x110
| [<0>] proc_pid_stack+0xc4/0x140
| [<0>] proc_single_show+0x6c/0x120
| [<0>] seq_read_iter+0x240/0x4e0
| [<0>] seq_read+0xe8/0x140
| [<0>] vfs_read+0xb8/0x1e4
| [<0>] ksys_read+0x74/0x100
| [<0>] __arm64_sys_read+0x28/0x3c
| [<0>] invoke_syscall+0x50/0x120
| [<0>] el0_svc_common.constprop.0+0xc4/0xd4
| [<0>] do_el0_svc+0x30/0x9c
| [<0>] el0t_64_sync_handler+0x1a8/0x1b0
| [<0>] el0t_64_sync+0x198/0x19c

After:

| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xc4/0x140
| [<0>] proc_single_show+0x6c/0x120
| [<0>] seq_read_iter+0x240/0x4e0
| [<0>] seq_read+0xe8/0x140
| [<0>] vfs_read+0xb8/0x1e4
| [<0>] ksys_read+0x74/0x100
| [<0>] __arm64_sys_read+0x28/0x3c
| [<0>] invoke_syscall+0x50/0x120
| [<0>] el0_svc_common.constprop.0+0xc4/0xd4
| [<0>] do_el0_svc+0x30/0x9c
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0t_64_sync_handler+0x1a8/0x1b0
| [<0>] el0t_64_sync+0x198/0x19c
| # echo function_graph > /sys/kernel/tracing/current_tracer
| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xc4/0x140
| [<0>] proc_single_show+0x6c/0x120
| [<0>] seq_read_iter+0x240/0x4e0
| [<0>] seq_read+0xe8/0x140
| [<0>] vfs_read+0xb8/0x1e4
| [<0>] ksys_read+0x74/0x100
| [<0>] __arm64_sys_read+0x28/0x3c
| [<0>] invoke_syscall+0x50/0x120
| [<0>] el0_svc_common.constprop.0+0xc4/0xd4
| [<0>] do_el0_svc+0x30/0x9c
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0t_64_sync_handler+0x1a8/0x1b0
| [<0>] el0t_64_sync+0x198/0x19c

Cc: <stable@vger.kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviwed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210802164845.45506-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-12 13:22:12 +02:00
Anshuman Khandual
bb5e089df7 arm64/mm: Fix ttbr0 values stored in struct thread_info for software-pan
[ Upstream commit 9163f01130 ]

When using CONFIG_ARM64_SW_TTBR0_PAN, a task's thread_info::ttbr0 must be
the TTBR0_EL1 value used to run userspace. With 52-bit PAs, the PA must be
packed into the TTBR using phys_to_ttbr(), but we forget to do this in some
of the SW PAN code. Thus, if the value is installed into TTBR0_EL1 (as may
happen in the uaccess routines), this could result in UNPREDICTABLE
behaviour.

Since hardware with 52-bit PA support almost certainly has HW PAN, which
will be used in preference, this shouldn't be a practical issue, but let's
fix this for consistency.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: 529c4b05a3 ("arm64: handle 52-bit addresses in TTBR")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1623749578-11231-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:07 +02:00
Mark Rutland
8d6acfe80d arm64: consistently use reserved_pg_dir
[ Upstream commit 833be850f1 ]

Depending on configuration options and specific code paths, we either
use the empty_zero_page or the configuration-dependent reserved_ttbr0
as a reserved value for TTBR{0,1}_EL1.

To simplify this code, let's always allocate and use the same
reserved_pg_dir, replacing reserved_ttbr0. Note that this is allocated
(and hence pre-zeroed), and is also marked as read-only in the kernel
Image mapping.

Keeping this separate from the empty_zero_page potentially helps with
robustness as the empty_zero_page is used in a number of cases where a
failure to map it read-only could allow it to become corrupted.

The (presently unused) swapper_pg_end symbol is also removed, and
comments are added wherever we rely on the offsets between the
pre-allocated pg_dirs to keep these cases easily identifiable.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201103102229.8542-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:06 +02:00
Tian Tao
d0214b841c arm64: perf: Convert snprintf to sysfs_emit
[ Upstream commit a5740e9555 ]

Use sysfs_emit instead of snprintf to avoid buf overrun,because in
sysfs_emit it strictly checks whether buf is null or buf whether
pagesize aligned, otherwise it returns an error.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Link: https://lore.kernel.org/r/1621497585-30887-1-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:52 +02:00
Valentin Schneider
3c51d82d0b sched/core: Initialize the idle task with preemption disabled
[ Upstream commit f1a0a376ca ]

As pointed out by commit

  de9b8f5dcb ("sched: Fix crash trying to dequeue/enqueue the idle thread")

init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.

As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().

Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().

Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().

Secondary startups were patched via coccinelle:

  @begone@
  @@

  -preempt_disable();
  ...
  cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:55:50 +02:00
Mark Rutland
51524fa8b5 arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
[ Upstream commit 4d6a38da8e ]

Zenghui reports that booting a kernel with "irqchip.gicv3_pseudo_nmi=1"
on the command line hits a warning during kernel entry, due to the way
we manipulate the PMR.

Early in the entry sequence, we call lockdep_hardirqs_off() to inform
lockdep that interrupts have been masked (as the HW sets DAIF wqhen
entering an exception). Architecturally PMR_EL1 is not affected by
exception entry, and we don't set GIC_PRIO_PSR_I_SET in the PMR early in
the exception entry sequence, so early in exception entry the PMR can
indicate that interrupts are unmasked even though they are masked by
DAIF.

If DEBUG_LOCKDEP is selected, lockdep_hardirqs_off() will check that
interrupts are masked, before we set GIC_PRIO_PSR_I_SET in any of the
exception entry paths, and hence lockdep_hardirqs_off() will WARN() that
something is amiss.

We can avoid this by consistently setting GIC_PRIO_PSR_I_SET during
exception entry so that kernel code sees a consistent environment. We
must also update local_daif_inherit() to undo this, as currently only
touches DAIF. For other paths, local_daif_restore() will update both
DAIF and the PMR. With this done, we can remove the existing special
cases which set this later in the entry code.

We always use (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET) for consistency with
local_daif_save(), as this will warn if it ever encounters
(GIC_PRIO_IRQOFF | GIC_PRIO_PSR_I_SET), and never sets this itself. This
matches the gic_prio_kentry_setup that we have to retain for
ret_to_user.

The original splat from Zenghui's report was:

| DEBUG_LOCKS_WARN_ON(!irqs_disabled())
| WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258 lockdep_hardirqs_off+0xd4/0xe8
| Modules linked in:
| CPU: 3 PID: 125 Comm: modprobe Tainted: G        W         5.12.0-rc8+ #463
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : lockdep_hardirqs_off+0xd4/0xe8
| lr : lockdep_hardirqs_off+0xd4/0xe8
| sp : ffff80002a39bad0
| pmr_save: 000000e0
| x29: ffff80002a39bad0 x28: ffff0000de214bc0
| x27: ffff0000de1c0400 x26: 000000000049b328
| x25: 0000000000406f30 x24: ffff0000de1c00a0
| x23: 0000000020400005 x22: ffff8000105f747c
| x21: 0000000096000044 x20: 0000000000498ef9
| x19: ffff80002a39bc88 x18: ffffffffffffffff
| x17: 0000000000000000 x16: ffff800011c61eb0
| x15: ffff800011700a88 x14: 0720072007200720
| x13: 0720072007200720 x12: 0720072007200720
| x11: 0720072007200720 x10: 0720072007200720
| x9 : ffff80002a39bad0 x8 : ffff80002a39bad0
| x7 : ffff8000119f0800 x6 : c0000000ffff7fff
| x5 : ffff8000119f07a8 x4 : 0000000000000001
| x3 : 9bcdab23f2432800 x2 : ffff800011730538
| x1 : 9bcdab23f2432800 x0 : 0000000000000000
| Call trace:
|  lockdep_hardirqs_off+0xd4/0xe8
|  enter_from_kernel_mode.isra.5+0x7c/0xa8
|  el1_abort+0x24/0x100
|  el1_sync_handler+0x80/0xd0
|  el1_sync+0x6c/0x100
|  __arch_clear_user+0xc/0x90
|  load_elf_binary+0x9fc/0x1450
|  bprm_execve+0x404/0x880
|  kernel_execve+0x180/0x188
|  call_usermodehelper_exec_async+0xdc/0x158
|  ret_from_fork+0x10/0x18

Fixes: 23529049c6 ("arm64: entry: fix non-NMI user<->kernel transitions")
Fixes: 7cd1ea1010 ("arm64: entry: fix non-NMI kernel<->kernel transitions")
Fixes: f0cd5ac1e4 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
Fixes: 2a9b3e6ac6 ("arm64: entry: fix EL1 debug transitions")
Link: https://lore.kernel.org/r/f4012761-026f-4e51-3a0c-7524e434e8b3@huawei.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210428111555.50880-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:07 +02:00
Marc Zyngier
c9f43423c4 arm64: entry: factor irq triage logic into macros
[ Upstream commit 9eb563cdab ]

In subsequent patches we'll allow an FIQ handler to be registered, and
FIQ exceptions will need to be triaged very similarly to IRQ exceptions.
So that we can reuse the existing logic, this patch factors the IRQ
triage logic out into macros that can be reused for FIQ.

The macros are named to follow the elX_foo_handler scheme used by the C
exception handlers. For consistency with other top-level exception
handlers, the kernel_entry/kernel_exit logic is not moved into the
macros. As FIQ will use a different C handler, this handler name is
provided as an argument to the macros.

There should be no functional change as a result of this patch.

Signed-off-by: Marc Zyngier <maz@kernel.org>
[Mark: rework macros, commit message, rebase before DAIF rework]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: James Morse <james.morse@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210315115629.57191-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:07 +02:00
Bill Wendling
0f90559366 arm64/vdso: Discard .note.gnu.property sections in vDSO
[ Upstream commit 388708028e ]

The arm64 assembler in binutils 2.32 and above generates a program
property note in a note section, .note.gnu.property, to encode used x86
ISAs and features. But the kernel linker script only contains a single
NOTE segment:

  PHDRS
  {
    text    PT_LOAD    FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
    dynamic PT_DYNAMIC FLAGS(4);               /* PF_R */
    note    PT_NOTE    FLAGS(4);               /* PF_R */
  }

The NOTE segment generated by the vDSO linker script is aligned to 4 bytes.
But the .note.gnu.property section must be aligned to 8 bytes on arm64.

  $ readelf -n vdso64.so

  Displaying notes found in: .note
    Owner                Data size      Description
    Linux                0x00000004     Unknown note type: (0x00000000)
     description data: 06 00 00 00
  readelf: Warning: note with invalid namesz and/or descsz found at offset 0x20
  readelf: Warning:  type: 0x78, namesize: 0x00000100, descsize: 0x756e694c, alignment: 8

Since the note.gnu.property section in the vDSO is not checked by the
dynamic linker, discard the .note.gnu.property sections in the vDSO.

Similar to commit 4caffe6a28 ("x86/vdso: Discard .note.gnu.property
sections in vDSO"), but for arm64.

Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210423205159.830854-1-morbo@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11 14:47:33 +02:00
Jisheng Zhang
509ae27a18 arm64: kprobes: Restore local irqflag if kprobes is cancelled
[ Upstream commit 738fa58ee1 ]

If instruction being single stepped caused a page fault, the kprobes
is cancelled to let the page fault handler continue as a normal page
fault. But the local irqflags are disabled so cpu will restore pstate
with DAIF masked. After pagefault is serviced, the kprobes is
triggerred again, we overwrite the saved_irqflag by calling
kprobes_save_local_irqflag(). NOTE, DAIF is masked in this new saved
irqflag. After kprobes is serviced, the cpu pstate is retored with
DAIF masked.

This patch is inspired by one patch for riscv from Liao Chang.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210412174101.6bfb0594@xhacker.debian
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-28 13:40:02 +02:00
Catalin Marinas
496e2fabbb arm64: mte: Ensure TIF_MTE_ASYNC_FAULT is set atomically
commit 2decad92f4 upstream.

The entry from EL0 code checks the TFSRE0_EL1 register for any
asynchronous tag check faults in user space and sets the
TIF_MTE_ASYNC_FAULT flag. This is not done atomically, potentially
racing with another CPU calling set_tsk_thread_flag().

Replace the non-atomic ORR+STR with an STSET instruction. While STSET
requires ARMv8.1 and an assembler that understands LSE atomics, the MTE
feature is part of ARMv8.5 and already requires an updated assembler.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 637ec831ea ("arm64: mte: Handle synchronous and asynchronous tag check faults")
Cc: <stable@vger.kernel.org> # 5.10.x
Reported-by: Will Deacon <will@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210409173710.18582-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-21 13:01:00 +02:00
Suzuki K Poulose
fa0c0dce58 KVM: arm64: Hide system instruction access to Trace registers
[ Upstream commit 1d676673d6 ]

Currently we advertise the ID_AA6DFR0_EL1.TRACEVER for the guest,
when the trace register accesses are trapped (CPTR_EL2.TTA == 1).
So, the guest will get an undefined instruction, if trusts the
ID registers and access one of the trace registers.
Lets be nice to the guest and hide the feature to avoid
unexpected behavior.

Even though this can be done at KVM sysreg emulation layer,
we do this by removing the TRACEVER from the sanitised feature
register field. This is fine as long as the ETM drivers
can handle the individual trace units separately, even
when there are differences among the CPUs.

Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210323120647.454211-2-suzuki.poulose@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-04-16 11:43:19 +02:00
Pavel Tatashin
3db5fc5565 arm64: kdump: update ppos when reading elfcorehdr
[ Upstream commit 141f8202cf ]

The ppos points to a position in the old kernel memory (and in case of
arm64 in the crash kernel since elfcorehdr is passed as a segment). The
function should update the ppos by the amount that was read. This bug is
not exposed by accident, but other platforms update this value properly.
So, fix it in ARM64 version of elfcorehdr_read() as well.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Fixes: e62aaeac42 ("arm64: kdump: provide /proc/vmcore file")
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210319205054.743368-1-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30 14:32:03 +02:00
Mark Rutland
1ced45535d arm64: stacktrace: don't trace arch_stack_walk()
commit c607ab4f91 upstream.

We recently converted arm64 to use arch_stack_walk() in commit:

  5fc57df2f6 ("arm64: stacktrace: Convert to ARCH_STACKWALK")

The core stacktrace code expects that (when tracing the current task)
arch_stack_walk() starts a trace at its caller, and does not include
itself in the trace. However, arm64's arch_stack_walk() includes itself,
and so traces include one more entry than callers expect. The core
stacktrace code which calls arch_stack_walk() tries to skip a number of
entries to prevent itself appearing in a trace, and the additional entry
prevents skipping one of the core stacktrace functions, leaving this in
the trace unexpectedly.

We can fix this by having arm64's arch_stack_walk() begin the trace with
its caller. The first value returned by the trace will be
__builtin_return_address(0), i.e. the caller of arch_stack_walk(). The
first frame record to be unwound will be __builtin_frame_address(1),
i.e. the caller's frame record. To prevent surprises, arch_stack_walk()
is also marked noinline.

While __builtin_frame_address(1) is not safe in portable code, local GCC
developers have confirmed that it is safe on arm64. To find the caller's
frame record, the builtin can safely dereference the current function's
frame record or (in theory) could stash the original FP into another GPR
at function entry time, neither of which are problematic.

Prior to this patch, the tracing code would unexpectedly show up in
traces of the current task, e.g.

| # cat /proc/self/stack
| [<0>] stack_trace_save_tsk+0x98/0x100
| [<0>] proc_pid_stack+0xb4/0x130
| [<0>] proc_single_show+0x60/0x110
| [<0>] seq_read_iter+0x230/0x4d0
| [<0>] seq_read+0xdc/0x130
| [<0>] vfs_read+0xac/0x1e0
| [<0>] ksys_read+0x6c/0xfc
| [<0>] __arm64_sys_read+0x20/0x30
| [<0>] el0_svc_common.constprop.0+0x60/0x120
| [<0>] do_el0_svc+0x24/0x90
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0_sync_handler+0x1a4/0x1b0
| [<0>] el0_sync+0x170/0x180

After this patch, the tracing code will not show up in such traces:

| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xb4/0x130
| [<0>] proc_single_show+0x60/0x110
| [<0>] seq_read_iter+0x230/0x4d0
| [<0>] seq_read+0xdc/0x130
| [<0>] vfs_read+0xac/0x1e0
| [<0>] ksys_read+0x6c/0xfc
| [<0>] __arm64_sys_read+0x20/0x30
| [<0>] el0_svc_common.constprop.0+0x60/0x120
| [<0>] do_el0_svc+0x24/0x90
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0_sync_handler+0x1a4/0x1b0
| [<0>] el0_sync+0x170/0x180

Erring on the side of caution, I've given this a spin with a bunch of
toolchains, verifying the output of /proc/self/stack and checking that
the assembly looked sound. For GCC (where we require version 5.1.0 or
later) I tested with the kernel.org crosstool binares for versions
5.5.0, 6.4.0, 6.5.0, 7.3.0, 7.5.0, 8.1.0, 8.3.0, 8.4.0, 9.2.0, and
10.1.0. For clang (where we require version 10.0.1 or later) I tested
with the llvm.org binary releases of 11.0.0, and 11.0.1.

Fixes: 5fc57df2f6 ("arm64: stacktrace: Convert to ARCH_STACKWALK")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jun <chenjun102@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> # 5.10.x
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210319184106.5688-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30 14:31:54 +02:00
Ard Biesheuvel
3ebd4bd2eb arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
[ Upstream commit 7ba8f2b2d6 ]

52-bit VA kernels can run on hardware that is only 48-bit capable, but
configure the ID map as 52-bit by default. This was not a problem until
recently, because the special T0SZ value for a 52-bit VA space was never
programmed into the TCR register anwyay, and because a 52-bit ID map
happens to use the same number of translation levels as a 48-bit one.

This behavior was changed by commit 1401bef703 ("arm64: mm: Always update
TCR_EL1 from __cpu_set_tcr_t0sz()"), which causes the unsupported T0SZ
value for a 52-bit VA to be programmed into TCR_EL1. While some hardware
simply ignores this, Mark reports that Amberwing systems choke on this,
resulting in a broken boot. But even before that commit, the unsupported
idmap_t0sz value was exposed to KVM and used to program TCR_EL2 incorrectly
as well.

Given that we already have to deal with address spaces being either 48-bit
or 52-bit in size, the cleanest approach seems to be to simply default to
a 48-bit VA ID map, and only switch to a 52-bit one if the placement of the
kernel in DRAM requires it. This is guaranteed not to happen unless the
system is actually 52-bit VA capable.

Fixes: 90ec95cda9 ("arm64: mm: Introduce VA_BITS_MIN")
Reported-by: Mark Salter <msalter@redhat.com>
Link: http://lore.kernel.org/r/20210310003216.410037-1-msalter@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210310171515.416643-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17 17:06:34 +01:00
Rob Herring
fb242be88d arm64: perf: Fix 64-bit event counter read truncation
commit 7bb8bc6eb5 upstream.

Commit 0fdf1bb759 ("arm64: perf: Avoid PMXEV* indirection") changed
armv8pmu_read_evcntr() to return a u32 instead of u64. The result is
silent truncation of the event counter when using 64-bit counters. Given
the offending commit appears to have passed thru several folks, it seems
likely this was a bad rebase after v8.5 PMU 64-bit counters landed.

Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 0fdf1bb759 ("arm64: perf: Avoid PMXEV* indirection")
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210310004412.1450128-1-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17 17:06:28 +01:00
Will Deacon
0ead6914dc arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
commit a2c42bbabb upstream.

The Spectre-v4 workaround is re-configured when resuming from suspend,
as the firmware may have re-enabled the mitigation despite the user
previously asking for it to be disabled.

Enabling or disabling the workaround can result in an undefined
instruction exception on CPUs which implement PSTATE.SSBS but only allow
it to be configured by adjusting the SPSR on exception return. We handle
this by installing an 'undef hook' which effectively emulates the access.

Installing this hook requires us to take a couple of spinlocks both to
avoid corrupting the internal list of hooks but also to ensure that we
don't run into an unhandled exception. Unfortunately, when resuming from
suspend, we haven't yet called rcu_idle_exit() and so lockdep gets angry
about "suspicious RCU usage". In doing so, it tries to print a warning,
which leads it to get even more suspicious, this time about itself:

 |  rcu_scheduler_active = 2, debug_locks = 1
 |  RCU used illegally from extended quiescent state!
 |  1 lock held by swapper/0:
 |   #0: (logbuf_lock){-.-.}-{2:2}, at: vprintk_emit+0x88/0x198
 |
 |  Call trace:
 |   dump_backtrace+0x0/0x1d8
 |   show_stack+0x18/0x24
 |   dump_stack+0xe0/0x17c
 |   lockdep_rcu_suspicious+0x11c/0x134
 |   trace_lock_release+0xa0/0x160
 |   lock_release+0x3c/0x290
 |   _raw_spin_unlock+0x44/0x80
 |   vprintk_emit+0xbc/0x198
 |   vprintk_default+0x44/0x6c
 |   vprintk_func+0x1f4/0x1fc
 |   printk+0x54/0x7c
 |   lockdep_rcu_suspicious+0x30/0x134
 |   trace_lock_acquire+0xa0/0x188
 |   lock_acquire+0x50/0x2fc
 |   _raw_spin_lock+0x68/0x80
 |   spectre_v4_enable_mitigation+0xa8/0x30c
 |   __cpu_suspend_exit+0xd4/0x1a8
 |   cpu_suspend+0xa0/0x104
 |   psci_cpu_suspend_enter+0x3c/0x5c
 |   psci_enter_idle_state+0x44/0x74
 |   cpuidle_enter_state+0x148/0x2f8
 |   cpuidle_enter+0x38/0x50
 |   do_idle+0x1f0/0x2b4

Prevent these splats by running __cpu_suspend_exit() with RCU watching.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Suggested-by: "Paul E . McKenney" <paulmck@kernel.org>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Fixes: c28762070c ("arm64: Rewrite Spectre-v4 mitigation code")
Cc: <stable@vger.kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210218140346.5224-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:36 +01:00
He Zhe
d623d5cb38 arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
commit d47422d953 upstream.

As stated in linux/errno.h, ENOTSUPP should never be seen by user programs.
When we set up uprobe with 32-bit perf and arm64 kernel, we would see the
following vague error without useful hint.

The sys_perf_event_open() syscall returned with 524 (INTERNAL ERROR:
strerror_r(524, [buf], 128)=22)

Use EOPNOTSUPP instead to indicate such cases.

Signed-off-by: He Zhe <zhe.he@windriver.com>
Link: https://lore.kernel.org/r/20210223082535.48730-1-zhe.he@windriver.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:36 +01:00
qiuguorui1
fa1fbfb644 arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
commit 656d1d58d8 upstream.

in function create_dtb(), if fdt_open_into() fails, we need to vfree
buf before return.

Fixes: 52b2a8af74 ("arm64: kexec_file: load initrd and device-tree")
Cc: stable@vger.kernel.org # v5.0
Signed-off-by: qiuguorui1 <qiuguorui1@huawei.com>
Link: https://lore.kernel.org/r/20210218125900.6810-1-qiuguorui1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:36 +01:00
Suzuki K Poulose
32009c5d17 arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
commit c0b15c25d2 upstream.

The erratum 1024718 affects Cortex-A55 r0p0 to r2p0. However
we apply the work around for r0p0 - r1p0. Unfortunately this
won't be fixed for the future revisions for the CPU. Thus
extend the work around for all versions of A55, to cover
for r2p0 and any future revisions.

Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210203230057.3961239-1-suzuki.poulose@arm.com
[will: Update Kconfig help text]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:35 +01:00
Timothy E Baldwin
428c4a4d0d arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
commit df84fe9470 upstream.

Since commit f086f67485 ("arm64: ptrace: add support for syscall
emulation"), if system call number -1 is called and the process is being
traced with PTRACE_SYSCALL, for example by strace, the seccomp check is
skipped and -ENOSYS is returned unconditionally (unless altered by the
tracer) rather than carrying out action specified in the seccomp filter.

The consequence of this is that it is not possible to reliably strace
a seccomp based implementation of a foreign system call interface in
which r7/x8 is permitted to be -1 on entry to a system call.

Also trace_sys_enter and audit_syscall_entry are skipped if a system
call is skipped.

Fix by removing the in_syscall(regs) check restoring the previous
behaviour which is like AArch32, x86 (which uses generic code) and
everything else.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Catalin Marinas<catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: f086f67485 ("arm64: ptrace: add support for syscall emulation")
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Timothy E Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Link: https://lore.kernel.org/r/90edd33b-6353-1228-791f-0336d94d5f8c@majoroak.me.uk
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04 11:38:32 +01:00
Marc Zyngier
b138d65cce arm64: Add missing ISB after invalidating TLB in __primary_switch
[ Upstream commit 9d41053e8d ]

Although there has been a bit of back and forth on the subject, it
appears that invalidating TLBs requires an ISB instruction when FEAT_ETS
is not implemented by the CPU.

From the bible:

  | In an implementation that does not implement FEAT_ETS, a TLB
  | maintenance instruction executed by a PE, PEx, can complete at any
  | time after it is issued, but is only guaranteed to be finished for a
  | PE, PEx, after the execution of DSB by the PEx followed by a Context
  | synchronization event

Add the missing ISB in __primary_switch, just in case.

Fixes: 3c5e9f238b ("arm64: head.S: move KASLR processing out of __enable_mmu()")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224093738.3629662-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 11:38:20 +01:00
Catalin Marinas
975a2396e3 arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
commit 68d54ceeec upstream.

The ptrace(PTRACE_PEEKMTETAGS) implementation checks whether the user
page has valid tags (mapped with PROT_MTE) by testing the PG_mte_tagged
page flag. If this bit is cleared, ptrace(PTRACE_PEEKMTETAGS) returns
-EIO.

A newly created (PROT_MTE) mapping points to the zero page which had its
tags zeroed during cpu_enable_mte(). If there were no prior writes to
this mapping, ptrace(PTRACE_PEEKMTETAGS) fails with -EIO since the zero
page does not have the PG_mte_tagged flag set.

Set PG_mte_tagged on the zero page when its tags are cleared during
boot. In addition, to avoid ptrace(PTRACE_PEEKMTETAGS) succeeding on
!PROT_MTE mappings pointing to the zero page, change the
__access_remote_tags() check to (vm_flags & VM_MTE) instead of
PG_mte_tagged.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 34bfeea4a9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Cc: <stable@vger.kernel.org> # 5.10.x
Cc: Will Deacon <will@kernel.org>
Reported-by: Luis Machado <luis.machado@linaro.org>
Tested-by: Luis Machado <luis.machado@linaro.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210210180316.23654-1-catalin.marinas@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 11:02:28 +01:00
Mark Rutland
93f8cc947b arm64: entry: remove redundant IRQ flag tracing
[ Upstream commit df06824767 ]

All EL0 returns go via ret_to_user(), which masks IRQs and notifies
lockdep and tracing before calling into do_notify_resume(). Therefore,
there's no need for do_notify_resume() to call trace_hardirqs_off(), and
the comment is stale. The call is simply redundant.

In ret_to_user() we call exit_to_user_mode(), which notifies lockdep and
tracing the IRQs will be enabled in userspace, so there's no need for
el0_svc_common() to call trace_hardirqs_on() before returning. Further,
at the start of ret_to_user() we call trace_hardirqs_off(), so not only
is this redundant, but it is immediately undone.

In addition to being redundant, the trace_hardirqs_on() in
el0_svc_common() leaves lockdep inconsistent with the hardware state,
and is liable to cause issues for any C code or instrumentation
between this and the call to trace_hardirqs_off() which undoes it in
ret_to_user().

This patch removes the redundant tracing calls and associated stale
comments.

Fixes: 23529049c6 ("arm64: entry: fix non-NMI user<->kernel transitions")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107145310.44616-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-27 11:55:07 +01:00
Shannon Zhao
ae4db0bc5c arm64: cpufeature: remove non-exist CONFIG_KVM_ARM_HOST
commit 45ba7b195a upstream.

Commit d82755b2e7 ("KVM: arm64: Kill off CONFIG_KVM_ARM_HOST") deletes
CONFIG_KVM_ARM_HOST option, it should use CONFIG_KVM instead.

Just remove CONFIG_KVM_ARM_HOST here.

Fixes: d82755b2e7 ("KVM: arm64: Kill off CONFIG_KVM_ARM_HOST")
Signed-off-by: Shannon Zhao <shannon.zhao@linux.alibaba.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1609760324-92271-1-git-send-email-shannon.zhao@linux.alibaba.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:17:04 +01:00
Peter Collingbourne
0e3847acca arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
[ Upstream commit 929c1f3384 ]

Previously we were always returning a tag inclusion mask of zero via
PR_GET_TAGGED_ADDR_CTRL if TCF0 was set to NONE. Fix it by making
the code for the NONE case match the others.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Iefbea66cf7d2b4c80b82f9639b9ea7f33f7fac53
Fixes: af5ce95282 ("arm64: mte: Allow user control of the generated random tags via prctl()")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201203075110.2781021-1-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:53:33 +01:00
Linus Torvalds
3bb61aa618 arm64 fixes for -rc7
- Fix numerous issues with instrumentation and exception entry
 
 - Fix hideous typo in unused register field definition
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/HvbwQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNFTMB/oD0ucfP6CH65w+7Sbsv2L8FABfYzSrA9gP
 f1cmeh1+MyRN4Nbx2ves5wcRGoX1CgZ8KFAmLXG6yyn7UDA/q27CTELknwobhOft
 tQIPB2hFDW9qq3VBXFReL3aoXLnWUiRL3nBxQFt7LG1Xor/ivEb1ZFht351UklDh
 u1P6NVptpjXFuGPvdqxkHo2WzT0QHI57MRuc1l7I1FRo4dV1nKSlwohu0Ydii4q9
 8oLhx77Ga1SWK80IztNmpo7CSMP/FLGDwbUE3vAaftUJx5CBt+lYR1CeWNACSEvy
 22y7CkJWKGQccG62oHI7zQaZm1+fum70ndP5dDlfQW/BcCaz8vRH
 =KSDc
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "I'm sad to say that we've got an unusually large arm64 fixes pull for
  rc7 which addresses numerous significant instrumentation issues with
  our entry code.

  Without these patches, lockdep is hopelessly unreliable in some
  configurations [1,2] and syzkaller is therefore not a lot of use
  because it's so noisy.

  Although much of this has always been broken, it appears to have been
  exposed more readily by other changes such as 044d0d6de9 ("lockdep:
  Only trace IRQ edges") and general lockdep improvements around IRQ
  tracing and NMIs.

  Fixing this properly required moving much of the instrumentation hooks
  from our entry assembly into C, which Mark has been working on for the
  last few weeks. We're not quite ready to move to the recently added
  generic functions yet, but the code here has been deliberately written
  to mimic that closely so we can look at cleaning things up once we
  have a bit more breathing room.

  Having said all that, the second version of these patches was posted
  last week and I pushed it into our CI (kernelci and cki) along with a
  commit which forced on PROVE_LOCKING, NOHZ_FULL and
  CONTEXT_TRACKING_FORCE. The result? We found a real bug in the
  md/raid10 code [3].

  Oh, and there's also a really silly typo patch that's unrelated.

  Summary:

   - Fix numerous issues with instrumentation and exception entry

   - Fix hideous typo in unused register field definition"

[1] https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com
[2] https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com
[3] https://lore.kernel.org/r/94c76d5e-466a-bc5f-e6c2-a11b65c39f83@redhat.com

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mte: Fix typo in macro definition
  arm64: entry: fix EL1 debug transitions
  arm64: entry: fix NMI {user, kernel}->kernel transitions
  arm64: entry: fix non-NMI kernel<->kernel transitions
  arm64: ptrace: prepare for EL1 irq/rcu tracking
  arm64: entry: fix non-NMI user<->kernel transitions
  arm64: entry: move el1 irq/nmi logic to C
  arm64: entry: prepare ret_to_user for function call
  arm64: entry: move enter_from_user_mode to entry-common.c
  arm64: entry: mark entry code as noinstr
  arm64: mark idle code as noinstr
  arm64: syscall: exit userspace before unmasking exceptions
2020-12-02 12:27:37 -08:00
Mark Rutland
2a9b3e6ac6 arm64: entry: fix EL1 debug transitions
In debug_exception_enter() and debug_exception_exit() we trace hardirqs
on/off while RCU isn't guaranteed to be watching, and we don't save and
restore the hardirq state, and so may return with this having changed.

Handle this appropriately with new entry/exit helpers which do the bare
minimum to ensure this is appropriately maintained, without marking
debug exceptions as NMIs. These are placed in entry-common.c with the
other entry/exit helpers.

In future we'll want to reconsider whether some debug exceptions should
be NMIs, but this will require a significant refactoring, and for now
this should prevent issues with lockdep and RCU.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marins <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:38 +00:00
Mark Rutland
f0cd5ac1e4 arm64: entry: fix NMI {user, kernel}->kernel transitions
Exceptions which can be taken at (almost) any time are consdiered to be
NMIs. On arm64 that includes:

* SDEI events
* GICv3 Pseudo-NMIs
* Kernel stack overflows
* Unexpected/unhandled exceptions

... but currently debug exceptions (BRKs, breakpoints, watchpoints,
single-step) are not considered NMIs.

As these can be taken at any time, kernel features (lockdep, RCU,
ftrace) may not be in a consistent kernel state. For example, we may
take an NMI from the idle code or partway through an entry/exit path.

While nmi_enter() and nmi_exit() handle most of this state, notably they
don't save/restore the lockdep state across an NMI being taken and
handled. When interrupts are enabled and an NMI is taken, lockdep may
see interrupts become disabled within the NMI code, but not see
interrupts become enabled when returning from the NMI, leaving lockdep
believing interrupts are disabled when they are actually disabled.

The x86 code handles this in idtentry_{enter,exit}_nmi(), which will
shortly be moved to the generic entry code. As we can't use either yet,
we copy the x86 approach in arm64-specific helpers. All the NMI
entrypoints are marked as noinstr to prevent any instrumentation
handling code being invoked before the state has been corrected.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:38 +00:00
Mark Rutland
7cd1ea1010 arm64: entry: fix non-NMI kernel<->kernel transitions
There are periods in kernel mode when RCU is not watching and/or the
scheduler tick is disabled, but we can still take exceptions such as
interrupts. The arm64 exception handlers do not account for this, and
it's possible that RCU is not watching while an exception handler runs.

The x86/generic entry code handles this by ensuring that all (non-NMI)
kernel exception handlers call irqentry_enter() and irqentry_exit(),
which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
the generic entry code, and already hadnle the user<->kernel transitions
elsewhere, so we add new kernel<->kernel transition helpers alog the
lines of the generic entry code.

Since we now track interrupts becoming masked when an exception is
taken, local_daif_inherit() is modified to track interrupts becoming
re-enabled when the original context is inherited. To balance the
entry/exit paths, each handler masks all DAIF exceptions before
exit_to_kernel_mode().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:38 +00:00
Mark Rutland
23529049c6 arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.

The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:

1) When we call __context_tracking_user_enter() HW IRQs must be disabled
   and lockdep must be up-to-date with this.

2) Userspace should be treated as having IRQs enabled from the PoV of
   both lockdep and tracing.

3) As context_tracking_user_enter() stops RCU from watching, we cannot
   use RCU after calling it.

4) IRQ flag tracing and lockdep have state that must be manipulated
   before RCU is disabled.

... with similar constraints applying for user->kernel transitions, with
the ordering reversed.

The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.

Note that:

* The accounting for debug exceptions from userspace now happens in
  el0_dbg() and ret_to_user(), so this is removed from
  debug_exception_enter() and debug_exception_exit(). As
  user_exit_irqoff() wakes RCU, the userspace-specific check is removed.

* The accounting for syscalls now happens in el0_svc(),
  el0_svc_compat(), and ret_to_user(), so this is removed from
  el0_svc_common(). This does not adversely affect the workaround for
  erratum 1463225, as this does not depend on any of the state tracking.

* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
  need to inform lockdep and tracing. Here a trace_hardirqs_off() is
  sufficient and safe as we have not yet exited kernel context and RCU
  is usable.

* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
  needs to check for the latter.

* EL0 SError handling will be dealt with in a subsequent patch, as this
  needs to be treated as an NMI.

Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:

| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
|  check_flags.part.54+0x1dc/0x1f0
|  lock_is_held_type+0x10c/0x188
|  rcu_read_lock_sched_held+0x70/0x98
|  __context_tracking_enter+0x310/0x350
|  context_tracking_enter.part.3+0x5c/0xc8
|  context_tracking_user_enter+0x6c/0x80
|  finish_ret_to_user+0x2c/0x13cr

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:38 +00:00
Mark Rutland
105fc33520 arm64: entry: move el1 irq/nmi logic to C
In preparation for reworking the EL1 irq/nmi entry code, move the
existing logic to C. We no longer need the asm_nmi_enter() and
asm_nmi_exit() wrappers, so these are removed. The new C functions are
marked noinstr, which prevents compiler instrumentation and runtime
probing.

In subsequent patches we'll want the new C helpers to be called in all
cases, so we don't bother wrapping the calls with ifdeferry. Even when
the new C functions are stubs the trivial calls are unlikely to have a
measurable impact on the IRQ or NMI paths anyway.

Prototypes are added to <asm/exception.h> as otherwise (in some
configurations) GCC will complain about the lack of a forward
declaration. We already do this for existing function, e.g.
enter_from_user_mode().

The new helpers are marked as noinstr (which prevents all
instrumentation, tracing, and kprobes). Otherwise, there should be no
functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:38 +00:00
Mark Rutland
3cb5ed4d76 arm64: entry: prepare ret_to_user for function call
In a subsequent patch ret_to_user will need to make a C function call
(in some configurations) which may clobber x0-x18 at the start of the
finish_ret_to_user block, before enable_step_tsk consumes the flags
loaded into x1.

In preparation for this, let's load the flags into x19, which is
preserved across C function calls. This avoids a redundant reload of the
flags and ensures we operate on a consistent shapshot regardless.

There should be no functional change as a result of this patch. At this
point of the entry/exit paths we only need to preserve x28 (tsk) and the
sp, and x19 is free for this use.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-6-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:37 +00:00
Mark Rutland
2f911d494f arm64: entry: move enter_from_user_mode to entry-common.c
In later patches we'll want to extend enter_from_user_mode() and add a
corresponding exit_to_user_mode(). As these will be common for all
entries/exits from userspace, it'd be better for these to live in
entry-common.c with the rest of the entry logic.

This patch moves enter_from_user_mode() into entry-common.c. As with
other functions in entry-common.c it is marked as noinstr (which
prevents all instrumentation, tracing, and kprobes) but there are no
other functional changes.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:37 +00:00
Mark Rutland
da19267648 arm64: entry: mark entry code as noinstr
Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(),
but they're still subject to other instrumentation which may rely on
lockdep/rcu/context-tracking being up-to-date, and may cause nested
exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt
exceptions registers which have not yet been read.

Prevent this by marking all functions in entry-common.c as noinstr to
prevent compiler instrumentation. This also blacklists the functions for
tracing and kprobes, so we don't need to handle that separately.
Functions elsewhere will be dealt with in subsequent patches.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:37 +00:00
Mark Rutland
114e0a6847 arm64: mark idle code as noinstr
Core code disables RCU when calling arch_cpu_idle(), so it's not safe
for arch_cpu_idle() or its calees to be instrumented, as the
instrumentation callbacks may attempt to use RCU or other features which
are unsafe to use in this context.

Mark them noinstr to prevent issues.

The use of local_irq_enable() in arch_cpu_idle() is similarly
problematic, and the "sched/idle: Fix arch_cpu_idle() vs tracing" patch
queued in the tip tree addresses that case.

Reported-by: Marco Elver <elver@google.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:37 +00:00
Mark Rutland
ca1314d73e arm64: syscall: exit userspace before unmasking exceptions
In el0_svc_common() we unmask exceptions before we call user_exit(), and
so there's a window where an IRQ or debug exception can be taken while
RCU is not watching. In do_debug_exception() we account for this in via
debug_exception_{enter,exit}(), but in the el1_irq asm we do not and we
call trace functions which rely on RCU before we have a guarantee that
RCU is watching.

Let's avoid this by having el0_svc_common() exit userspace before
unmasking exceptions, matching what we do for all other EL0 entry paths.
We can use user_exit_irqoff() to avoid the pointless save/restore of IRQ
flags while we're sure exceptions are masked in DAIF.

The workaround for Cortex-A76 erratum 1463225 may trigger a debug
exception before this point, but the debug code invoked in this case is
safe even when RCU is not watching.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 12:11:37 +00:00
Linus Torvalds
f91a3aa6bc Yet two more places which invoke tracing from RCU disabled regions in the
idle path. Similar to the entry path the low level idle functions have to
 be non-instrumentable.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/DpAUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXSLD/9klc0YimnEnROW6Q5Svb2IcyIutmXF
 bOIY1bYYoKILOBj3wyvDUhmdMuq5zh7H9yG11hO8MaVVWVQcLcOMLdHTYm9dcdmF
 xQk33+xqjuhRShB+nEmC9ayYtWogtH6W6uZ6WDtF9ZltMKU85n5ddGJ/Fvo+HoCb
 NbOdHGJdJ3/3ZCeHnxOnxM+5/GwjkBuccTV/tXmb3yXrfU9DBySyQ4/UchcpF43w
 LcEb0kiQbpZsBTByKJOQV8+RR654S0sILlvRwVXpmj94vrgGwhlVk1/9rz7tkOhF
 ksoo1mTVu75LMt22G/hXxE63787yRvFdHjapf0+kCOAuhl992NK+xlGDH8o9DXcu
 9y73D4bI0HnDFs20w6vs20iLvxECJiYHJqlgR5ZwFUToceaNgtiYr8kzuD7Zbae1
 KG2E7BuNSwHWMtf97fGn44GZknPEOaKdDn4Wv6/bvKHxLm77qe11RKF70Stcz2AI
 am13KmQzzsHGF5qNWwpElRUxSdxfJMR66RnOdTQULGrRedaZTFol/y2pnVzTSe3k
 SZnlpL5kE7y92UYDogPb5wWA7b+YkJN0OdSkRFy1FH26ZG8E4M7ZJ2tql5Sw7pGM
 lsTjXpAUphnK5rz7QcYE8KAZWj//fIAcElIrvdklVcBnS3IqjfksYW27B64133vx
 cT1B/lA1PHXj6Q==
 =raED
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two more places which invoke tracing from RCU disabled regions in the
  idle path.

  Similar to the entry path the low level idle functions have to be
  non-instrumentable"

* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  intel_idle: Fix intel_idle() vs tracing
  sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-29 11:19:26 -08:00
Peter Zijlstra
58c644ba51 sched/idle: Fix arch_cpu_idle() vs tracing
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-24 16:47:35 +01:00
Linus Torvalds
64b609d6a6 A set of fixes for perf:
- A set of commits which reduce the stack usage of various perf event
    handling functions which allocated large data structs on stack causing
    stack overflows in the worst case.
 
  - Use the proper mechanism for detecting soft interrupts in the recursion
    protection.
 
  - Make the resursion protection simpler and more robust.
 
  - Simplify the scheduling of event groups to make the code more robust and
    prepare for fixing the issues vs. scheduling of exclusive event groups.
 
  - Prevent event multiplexing and rotation for exclusive event groups
 
  - Correct the perf event attribute exclusive semantics to take pinned
    events, e.g. the PMU watchdog, into account
 
  - Make the anythread filtering conditional for Intel's generic PMU
    counters as it is not longer guaranteed to be supported on newer
    CPUs. Check the corresponding CPUID leaf to make sure.
 
  - Fixup a duplicate initialization in an array which was probably cause by
    the usual copy & paste - forgot to edit mishap.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xIi0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofixD/4+4gc8DhOmAkMrN0Z9tiW8ebgMKmb9
 wZRkMr5Osi0GzLJOPZ6SdY6jd0A3rMN/sW6P1DT6pDtcty4bKFoW5VZBuUDIAhel
 BC4C93L3y1En/GEZu1GTy3LvsBwLBQTOoY4goDjbdAbk60S/0RTHOGyQsRsOQFe6
 fVs3iXozAFuaR6I6N3dlxuJAE51zvr8MyBWaUoByNDB//1+lLNW+JfClaAOG1oXx
 qZIg/niatBVGzSGgKNRUyh3g8G1HJtabsA/NZ4PH8ZHuYABfmj4lmmUPR77ICLfV
 wMITEBG7eaktB8EqM9hvaoOZLA5kpXHO2JbCFSs4c4x11mlC8g7QMV3poCw33YoN
 a5TmT1A3muri1riy1/Ee9lXACOq7/tf2+Xfn9o6dvDdBwd6s5pzlhLGR8gILp2lF
 2bcg3IwYvHT/Kiurb/WGNpbCqQIPJpcUcfs3tNBCCtKegahUQNnGjxN3NVo9RCit
 zfL6xIJ8eZiYnsxXx4NKm744AukWiql3aRNgRkOdBP5WC68xt6VLcxG1YZKUoDhy
 jRSOCD/DuPSMSvAAgN7S8OWlPsKWBxVxxWYV+K8FpwhgzbQ3WbS3UDiYkhgjeOxu
 OlM692oWpllKvQWlvYthr2Be6oPCRRi1vvADNNbTKzgHk5i61bwympsGl1EZx3Pz
 2ROp7NJFRESnqw==
 =FzCf
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A set of fixes for perf:

    - A set of commits which reduce the stack usage of various perf
      event handling functions which allocated large data structs on
      stack causing stack overflows in the worst case

    - Use the proper mechanism for detecting soft interrupts in the
      recursion protection

    - Make the resursion protection simpler and more robust

    - Simplify the scheduling of event groups to make the code more
      robust and prepare for fixing the issues vs. scheduling of
      exclusive event groups

    - Prevent event multiplexing and rotation for exclusive event groups

    - Correct the perf event attribute exclusive semantics to take
      pinned events, e.g. the PMU watchdog, into account

    - Make the anythread filtering conditional for Intel's generic PMU
      counters as it is not longer guaranteed to be supported on newer
      CPUs. Check the corresponding CPUID leaf to make sure

    - Fixup a duplicate initialization in an array which was probably
      caused by the usual 'copy & paste - forgot to edit' mishap"

* tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix Add BW copypasta
  perf/x86/intel: Make anythread filter support conditional
  perf: Tweak perf_event_attr::exclusive semantics
  perf: Fix event multiplexing for exclusive groups
  perf: Simplify group_sched_in()
  perf: Simplify group_sched_out()
  perf/x86: Make dummy_iregs static
  perf/arch: Remove perf_sample_data::regs_user_copy
  perf: Optimize get_recursion_context()
  perf: Fix get_recursion_context()
  perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
  perf: Reduce stack usage of perf_output_begin()
2020-11-15 09:46:36 -08:00
Linus Torvalds
88b31f07f3 arm64 fixes for -rc4
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
 
 - Fix RCU splat when failing to online a CPU due to a feature mismatch
 
 - Fix a recently introduced sparse warning in kexec()
 
 - Fix handling of CPU erratum 1418040 for late CPUs
 
 - Ensure hot-added memory falls within linear-mapped region
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+ubogQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNPD7B/9i5ao44AEJwjz0a68S/jD7kUD7i3xVkCNN
 Y8i/i9mx44IAcf8pmyQh3ngaFlJuF2C6oC/SQFiDbmVeGeZXLXvXV7uGAqXG5Xjm
 O2Svgr1ry176JWpsB7MNnZwzAatQffdkDjbjQCcUnUIKYcLvge8H2fICljujGcfQ
 094vNmT9VerTWRbWDti3Ck/ug+sanVHuzk5BWdKx3jamjeTqo+sBZK/wgBr6UoYQ
 mT3BFX42kLIGg+AzwXRDPlzkJymjYgQDbSwGsvny8qKdOEJbAUwWXYZ5sTs9J/gU
 E9PT3VJI7BYtTd1uPEWkD645U3arfx3Pf2JcJlbkEp86qx4CUF9s
 =T6k4
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:

 - Spectre/Meltdown safelisting for some Qualcomm KRYO cores

 - Fix RCU splat when failing to online a CPU due to a feature mismatch

 - Fix a recently introduced sparse warning in kexec()

 - Fix handling of CPU erratum 1418040 for late CPUs

 - Ensure hot-added memory falls within linear-mapped region

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
  arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
  arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
  arm64: Add MIDR value for KRYO2XX gold/silver CPU cores
  arm64/mm: Validate hotplug range before creating linear mapping
  arm64: smp: Tell RCU about CPUs that fail to come online
  arm64: psci: Avoid printing in cpu_psci_cpu_die()
  arm64: kexec_file: Fix sparse warning
  arm64: errata: Fix handling of 1418040 with late CPU onlining
2020-11-13 09:23:10 -08:00
Konrad Dybcio
23c2164160 arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
QCOM KRYO2XX Silver cores are Cortex-A53 based and are
susceptible to the 845719 erratum. Add them to the lookup
list to apply the erratum.

Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20201104232218.198800-5-konrad.dybcio@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-13 09:47:08 +00:00
Konrad Dybcio
38328d4011 arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
KRYO2XX silver (LITTLE) CPUs are based on Cortex-A53
and they are not affected by spectre-v2.

Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20201104232218.198800-4-konrad.dybcio@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-13 09:47:08 +00:00
Konrad Dybcio
e3dd11a9f2 arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
QCOM KRYO2XX gold (big) silver (LITTLE) CPU cores are based on
Cortex-A73 and Cortex-A53 respectively and are meltdown safe,
hence add them to kpti_safe_list[].

Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20201104232218.198800-3-konrad.dybcio@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-13 09:47:08 +00:00