linux-stable/arch/arm64/kernel/stacktrace.c

246 lines
6 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* Stack tracing support
*
* Copyright (C) 2012 ARM Ltd.
*/
#include <linux/kernel.h>
#include <linux/export.h>
arm64: ftrace: fix a stack tracer's output under function graph tracer Function graph tracer modifies a return address (LR) in a stack frame to hook a function return. This will result in many useless entries (return_to_handler) showing up in a) a stack tracer's output b) perf call graph (with perf record -g) c) dump_backtrace (at panic et al.) For example, in case of a), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo 1 > /proc/sys/kernel/stack_trace_enabled $ cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (54 entries) ----- ---- -------- 0) 4504 16 gic_raise_softirq+0x28/0x150 1) 4488 80 smp_cross_call+0x38/0xb8 2) 4408 48 return_to_handler+0x0/0x40 3) 4360 32 return_to_handler+0x0/0x40 ... In case of b), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ perf record -e mem:XXX:x -ag -- sleep 10 $ perf report ... | | |--0.22%-- 0x550f8 | | | 0x10888 | | | el0_svc_naked | | | sys_openat | | | return_to_handler | | | return_to_handler ... In case of c), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo c > /proc/sysrq-trigger ... Call trace: [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 [<ffffffc000092250>] return_to_handler+0x0/0x40 [<ffffffc000092250>] return_to_handler+0x0/0x40 ... This patch replaces such entries with real addresses preserved in current->ret_stack[] at unwind_frame(). This way, we can cover all the cases. Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [will: fixed minor context changes conflicting with irq stack bits] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-15 08:33:41 +00:00
#include <linux/ftrace.h>
#include <linux/kprobes.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
#include <linux/stacktrace.h>
#include <asm/irq.h>
#include <asm/pointer_auth.h>
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
/*
* AArch64 PCS assigns the frame pointer to x29.
*
* A simple function prologue looks like this:
* sub sp, sp, #0x10
* stp x29, x30, [sp]
* mov x29, sp
*
* A simple function epilogue looks like this:
* mov sp, x29
* ldp x29, x30, [sp]
* add sp, sp, #0x10
*/
void start_backtrace(struct stackframe *frame, unsigned long fp,
unsigned long pc)
{
frame->fp = fp;
frame->pc = pc;
#ifdef CONFIG_KRETPROBES
frame->kr_cur = NULL;
#endif
/*
* Prime the first unwind.
*
* In unwind_frame() we'll check that the FP points to a valid stack,
* which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
* treated as a transition to whichever stack that happens to be. The
* prev_fp value won't be used, but we set it to 0 such that it is
* definitely not an accessible stack address.
*/
bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
frame->prev_fp = 0;
frame->prev_type = STACK_TYPE_UNKNOWN;
}
/*
* Unwind from one frame record (A) to the next frame record (B).
*
* We terminate early if the location of B indicates a malformed chain of frame
* records (e.g. a cycle), determined based on the location and fp value of A
* and the location (but not the fp value) of B.
*/
int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
{
unsigned long fp = frame->fp;
struct stack_info info;
arm64: fix dump_backtrace/unwind_frame with NULL tsk In some places, dump_backtrace() is called with a NULL tsk parameter, e.g. in bug_handler() in arch/arm64, or indirectly via show_stack() in core code. The expectation is that this is treated as if current were passed instead of NULL. Similar is true of unwind_frame(). Commit a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") didn't take this into account. In dump_backtrace() it compares tsk against current *before* we check if tsk is NULL, and in unwind_frame() we never set tsk if it is NULL. Due to this, we won't initialise irq_stack_ptr in either function. In dump_backtrace() this results in calling dump_mem() for memory immediately above the IRQ stack range, rather than for the relevant range on the task stack. In unwind_frame we'll reject unwinding frames on the IRQ stack. In either case this results in incomplete or misleading backtrace information, but is not otherwise problematic. The initial percpu areas (including the IRQ stacks) are allocated in the linear map, and dump_mem uses __get_user(), so we shouldn't access anything with side-effects, and will handle holes safely. This patch fixes the issue by having both functions handle the NULL tsk case before doing anything else with tsk. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") Acked-by: James Morse <james.morse@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yang Shi <yang.shi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-23 16:55:05 +00:00
if (!tsk)
tsk = current;
arm64: Implement stack trace termination record Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. We'd like to use task_pt_regs(task)->stackframe as the final frame record, as this is already setup upon exception entry from EL0. For kernel tasks we need to consistently reserve the pt_regs and point x29 at this, which we can do with small changes to __primary_switched, __secondary_switched, and copy_process(). Since the final frame record must be at a specific location, we must create the final frame record in __primary_switched and __secondary_switched rather than leaving this to start_kernel and secondary_start_kernel. Thus, __primary_switched and __secondary_switched will now show up in stacktraces for the idle tasks. Since the final frame record is now identified by its location rather than by its contents, we identify it at the start of unwind_frame(), before we read any values from it. External debuggers may terminate the stack trace when FP == 0. In the pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the debugger may print an extra record 0x0 at the end. While this is not pretty, this does not do any harm. This is a small price to pay for having reliable stack trace termination in the kernel. That said, gdb does not show the extra record probably because it uses DWARF and not frame pointers for stack traces. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> [Mark: rebase, use ASM_BUG(), update comments, update commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
/* Final frame; nothing to unwind */
if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
return -ENOENT;
if (fp & 0x7)
arm64: Implement stack trace termination record Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. We'd like to use task_pt_regs(task)->stackframe as the final frame record, as this is already setup upon exception entry from EL0. For kernel tasks we need to consistently reserve the pt_regs and point x29 at this, which we can do with small changes to __primary_switched, __secondary_switched, and copy_process(). Since the final frame record must be at a specific location, we must create the final frame record in __primary_switched and __secondary_switched rather than leaving this to start_kernel and secondary_start_kernel. Thus, __primary_switched and __secondary_switched will now show up in stacktraces for the idle tasks. Since the final frame record is now identified by its location rather than by its contents, we identify it at the start of unwind_frame(), before we read any values from it. External debuggers may terminate the stack trace when FP == 0. In the pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the debugger may print an extra record 0x0 at the end. While this is not pretty, this does not do any harm. This is a small price to pay for having reliable stack trace termination in the kernel. That said, gdb does not show the extra record probably because it uses DWARF and not frame pointers for stack traces. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> [Mark: rebase, use ASM_BUG(), update comments, update commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-10 11:00:26 +00:00
return -EINVAL;
if (!on_accessible_stack(tsk, fp, 16, &info))
return -EINVAL;
if (test_bit(info.type, frame->stacks_done))
return -EINVAL;
/*
* As stacks grow downward, any valid record on the same stack must be
* at a strictly higher address than the prior record.
*
* Stacks can nest in several valid orders, e.g.
*
* TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
* TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
*
* ... but the nesting itself is strict. Once we transition from one
* stack to another, it's never valid to unwind back to that first
* stack.
*/
if (info.type == frame->prev_type) {
if (fp <= frame->prev_fp)
return -EINVAL;
} else {
set_bit(frame->prev_type, frame->stacks_done);
}
/*
* Record this frame record's values and location. The prev_fp and
* prev_type are only meaningful to the next unwind_frame() invocation.
*/
arm64: disable kasan when accessing frame->fp in unwind_frame When boot arm64 kernel with KASAN enabled, the below error is reported by kasan: BUG: KASAN: out-of-bounds in unwind_frame+0xec/0x260 at addr ffffffc064d57ba0 Read of size 8 by task pidof/499 page:ffffffbdc39355c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 2 PID: 499 Comm: pidof Not tainted 4.5.0-rc1 #119 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Call trace: [<ffffffc00008d078>] dump_backtrace+0x0/0x290 [<ffffffc00008d32c>] show_stack+0x24/0x30 [<ffffffc0006a981c>] dump_stack+0x8c/0xd8 [<ffffffc0002e4400>] kasan_report_error+0x558/0x588 [<ffffffc0002e4958>] kasan_report+0x60/0x70 [<ffffffc0002e3188>] __asan_load8+0x60/0x78 [<ffffffc00008c92c>] unwind_frame+0xec/0x260 [<ffffffc000087e60>] get_wchan+0x110/0x160 [<ffffffc0003b647c>] do_task_stat+0xb44/0xb68 [<ffffffc0003b7730>] proc_tgid_stat+0x40/0x50 [<ffffffc0003ac840>] proc_single_show+0x88/0xd8 [<ffffffc000345be8>] seq_read+0x370/0x770 [<ffffffc00030aba0>] __vfs_read+0xc8/0x1d8 [<ffffffc00030c0ec>] vfs_read+0x94/0x168 [<ffffffc00030d458>] SyS_read+0xb8/0x128 [<ffffffc000086530>] el0_svc_naked+0x24/0x28 Memory state around the buggy address: ffffffc064d57a80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f4 f4 ffffffc064d57b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffc064d57b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffffffc064d57c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc064d57c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Since the shadow byte pointed by the report is 0, so it may mean it is just hit oob in non-current task. So, disable the instrumentation to silence these warnings. Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-08 17:13:09 +00:00
frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
frame->prev_fp = fp;
frame->prev_type = info.type;
arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph tracer is in use, unwind_frame() may erroneously associate a traced function with an incorrect return address. This can happen when starting an unwind from a pt_regs, or when unwinding across an exception boundary. This can be seen when recording with perf while the function graph tracer is in use. For example: | # echo function_graph > /sys/kernel/debug/tracing/current_tracer | # perf record -g -e raw_syscalls:sys_enter:k /bin/true | # perf report ... reports the callchain erroneously as: | el0t_64_sync | el0t_64_sync_handler | el0_svc_common.constprop.0 | perf_callchain | get_perf_callchain | syscall_trace_enter | syscall_trace_enter ... whereas when the function graph tracer is not in use, it reports: | el0t_64_sync | el0t_64_sync_handler | el0_svc | do_el0_svc | el0_svc_common.constprop.0 | syscall_trace_enter | syscall_trace_enter The underlying problem is that ftrace_graph_get_ret_stack() takes an index offset from the most recent entry added to the fgraph return stack. We start an unwind at offset 0, and increment the offset each time we encounter a rewritten return address (i.e. when we see `return_to_handler`). This is broken in two cases: 1) Between creating a pt_regs and starting the unwind, function calls may place entries on the stack, leaving an arbitrary offset which we can only determine by performing a full unwind from the caller of the unwind code (and relying on none of the unwind code being instrumented). This can result in erroneous entries being reported in a backtrace recorded by perf or kfence when the function graph tracer is in use. Currently show_regs() is unaffected as dump_backtrace() performs an initial unwind. 2) When unwinding across an exception boundary (whether continuing an unwind or starting a new unwind from regs), we currently always skip the LR of the interrupted context. Where this was live and contained a rewritten address, we won't consume the corresponding fgraph ret stack entry, leaving subsequent entries off-by-one. This can result in erroneous entries being reported in a backtrace performed by any in-kernel unwinder when that backtrace crosses an exception boundary, with entries after the boundary being reported incorrectly. This includes perf, kfence, show_regs(), panic(), etc. To fix this, we need to be able to uniquely identify each rewritten return address such that we can map this back to the original return address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate each rewritten return address with a unique location on the stack. As the return address is passed in the LR (and so is not guaranteed a unique location in memory), we use the FP upon entry to the function (i.e. the address of the caller's frame record) as the return address pointer. Any nested call will have a different FP value as the caller must create its own frame record and update FP to point to this. Since ftrace_graph_ret_addr() requires the return address with the PAC stripped, the stripping of the PAC is moved before the fixup of the rewritten address. As we would unconditionally strip the PAC, moving this earlier is not harmful, and we can avoid a redundant strip in the return address fixup code. I've tested this with the perf case above, the ftrace selftests, and a number of ad-hoc unwinder tests. The tests all pass, and I have seen no unexpected behaviour as a result of this change. I've tested with pointer authentication under QEMU TCG where magic-sysrq+l correctly recovers the original return addresses. Note that this doesn't fix the issue of skipping a live LR at an exception boundary, which is a more general problem and requires more substantial rework. Were we to consume the LR in all cases this would result in warnings where the interrupted context's LR contains `return_to_handler`, but the FP has been altered, e.g. | func: | <--- ftrace entry ---> // logs FP & LR, rewrites LR | STP FP, LR, [SP, #-16]! | MOV FP, SP | <--- INTERRUPT ---> ... as ftrace_graph_get_ret_stack() fill not find a matching entry, triggering the WARN_ON_ONCE() in unwind_frame(). Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-29 16:22:45 +00:00
frame->pc = ptrauth_strip_insn_pac(frame->pc);
arm64: ftrace: fix a stack tracer's output under function graph tracer Function graph tracer modifies a return address (LR) in a stack frame to hook a function return. This will result in many useless entries (return_to_handler) showing up in a) a stack tracer's output b) perf call graph (with perf record -g) c) dump_backtrace (at panic et al.) For example, in case of a), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo 1 > /proc/sys/kernel/stack_trace_enabled $ cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (54 entries) ----- ---- -------- 0) 4504 16 gic_raise_softirq+0x28/0x150 1) 4488 80 smp_cross_call+0x38/0xb8 2) 4408 48 return_to_handler+0x0/0x40 3) 4360 32 return_to_handler+0x0/0x40 ... In case of b), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ perf record -e mem:XXX:x -ag -- sleep 10 $ perf report ... | | |--0.22%-- 0x550f8 | | | 0x10888 | | | el0_svc_naked | | | sys_openat | | | return_to_handler | | | return_to_handler ... In case of c), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo c > /proc/sysrq-trigger ... Call trace: [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 [<ffffffc000092250>] return_to_handler+0x0/0x40 [<ffffffc000092250>] return_to_handler+0x0/0x40 ... This patch replaces such entries with real addresses preserved in current->ret_stack[] at unwind_frame(). This way, we can cover all the cases. Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [will: fixed minor context changes conflicting with irq stack bits] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-15 08:33:41 +00:00
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
arm64: fix dump_backtrace/unwind_frame with NULL tsk In some places, dump_backtrace() is called with a NULL tsk parameter, e.g. in bug_handler() in arch/arm64, or indirectly via show_stack() in core code. The expectation is that this is treated as if current were passed instead of NULL. Similar is true of unwind_frame(). Commit a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") didn't take this into account. In dump_backtrace() it compares tsk against current *before* we check if tsk is NULL, and in unwind_frame() we never set tsk if it is NULL. Due to this, we won't initialise irq_stack_ptr in either function. In dump_backtrace() this results in calling dump_mem() for memory immediately above the IRQ stack range, rather than for the relevant range on the task stack. In unwind_frame we'll reject unwinding frames on the IRQ stack. In either case this results in incomplete or misleading backtrace information, but is not otherwise problematic. The initial percpu areas (including the IRQ stacks) are allocated in the linear map, and dump_mem uses __get_user(), so we shouldn't access anything with side-effects, and will handle holes safely. This patch fixes the issue by having both functions handle the NULL tsk case before doing anything else with tsk. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") Acked-by: James Morse <james.morse@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yang Shi <yang.shi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-23 16:55:05 +00:00
if (tsk->ret_stack &&
arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph tracer is in use, unwind_frame() may erroneously associate a traced function with an incorrect return address. This can happen when starting an unwind from a pt_regs, or when unwinding across an exception boundary. This can be seen when recording with perf while the function graph tracer is in use. For example: | # echo function_graph > /sys/kernel/debug/tracing/current_tracer | # perf record -g -e raw_syscalls:sys_enter:k /bin/true | # perf report ... reports the callchain erroneously as: | el0t_64_sync | el0t_64_sync_handler | el0_svc_common.constprop.0 | perf_callchain | get_perf_callchain | syscall_trace_enter | syscall_trace_enter ... whereas when the function graph tracer is not in use, it reports: | el0t_64_sync | el0t_64_sync_handler | el0_svc | do_el0_svc | el0_svc_common.constprop.0 | syscall_trace_enter | syscall_trace_enter The underlying problem is that ftrace_graph_get_ret_stack() takes an index offset from the most recent entry added to the fgraph return stack. We start an unwind at offset 0, and increment the offset each time we encounter a rewritten return address (i.e. when we see `return_to_handler`). This is broken in two cases: 1) Between creating a pt_regs and starting the unwind, function calls may place entries on the stack, leaving an arbitrary offset which we can only determine by performing a full unwind from the caller of the unwind code (and relying on none of the unwind code being instrumented). This can result in erroneous entries being reported in a backtrace recorded by perf or kfence when the function graph tracer is in use. Currently show_regs() is unaffected as dump_backtrace() performs an initial unwind. 2) When unwinding across an exception boundary (whether continuing an unwind or starting a new unwind from regs), we currently always skip the LR of the interrupted context. Where this was live and contained a rewritten address, we won't consume the corresponding fgraph ret stack entry, leaving subsequent entries off-by-one. This can result in erroneous entries being reported in a backtrace performed by any in-kernel unwinder when that backtrace crosses an exception boundary, with entries after the boundary being reported incorrectly. This includes perf, kfence, show_regs(), panic(), etc. To fix this, we need to be able to uniquely identify each rewritten return address such that we can map this back to the original return address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate each rewritten return address with a unique location on the stack. As the return address is passed in the LR (and so is not guaranteed a unique location in memory), we use the FP upon entry to the function (i.e. the address of the caller's frame record) as the return address pointer. Any nested call will have a different FP value as the caller must create its own frame record and update FP to point to this. Since ftrace_graph_ret_addr() requires the return address with the PAC stripped, the stripping of the PAC is moved before the fixup of the rewritten address. As we would unconditionally strip the PAC, moving this earlier is not harmful, and we can avoid a redundant strip in the return address fixup code. I've tested this with the perf case above, the ftrace selftests, and a number of ad-hoc unwinder tests. The tests all pass, and I have seen no unexpected behaviour as a result of this change. I've tested with pointer authentication under QEMU TCG where magic-sysrq+l correctly recovers the original return addresses. Note that this doesn't fix the issue of skipping a live LR at an exception boundary, which is a more general problem and requires more substantial rework. Were we to consume the LR in all cases this would result in warnings where the interrupted context's LR contains `return_to_handler`, but the FP has been altered, e.g. | func: | <--- ftrace entry ---> // logs FP & LR, rewrites LR | STP FP, LR, [SP, #-16]! | MOV FP, SP | <--- INTERRUPT ---> ... as ftrace_graph_get_ret_stack() fill not find a matching entry, triggering the WARN_ON_ONCE() in unwind_frame(). Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-29 16:22:45 +00:00
(frame->pc == (unsigned long)return_to_handler)) {
unsigned long orig_pc;
arm64: ftrace: fix a stack tracer's output under function graph tracer Function graph tracer modifies a return address (LR) in a stack frame to hook a function return. This will result in many useless entries (return_to_handler) showing up in a) a stack tracer's output b) perf call graph (with perf record -g) c) dump_backtrace (at panic et al.) For example, in case of a), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo 1 > /proc/sys/kernel/stack_trace_enabled $ cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (54 entries) ----- ---- -------- 0) 4504 16 gic_raise_softirq+0x28/0x150 1) 4488 80 smp_cross_call+0x38/0xb8 2) 4408 48 return_to_handler+0x0/0x40 3) 4360 32 return_to_handler+0x0/0x40 ... In case of b), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ perf record -e mem:XXX:x -ag -- sleep 10 $ perf report ... | | |--0.22%-- 0x550f8 | | | 0x10888 | | | el0_svc_naked | | | sys_openat | | | return_to_handler | | | return_to_handler ... In case of c), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo c > /proc/sysrq-trigger ... Call trace: [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 [<ffffffc000092250>] return_to_handler+0x0/0x40 [<ffffffc000092250>] return_to_handler+0x0/0x40 ... This patch replaces such entries with real addresses preserved in current->ret_stack[] at unwind_frame(). This way, we can cover all the cases. Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [will: fixed minor context changes conflicting with irq stack bits] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-15 08:33:41 +00:00
/*
* This is a case where function graph tracer has
* modified a return address (LR) in a stack frame
* to hook a function return.
* So replace it to an original value.
*/
arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph tracer is in use, unwind_frame() may erroneously associate a traced function with an incorrect return address. This can happen when starting an unwind from a pt_regs, or when unwinding across an exception boundary. This can be seen when recording with perf while the function graph tracer is in use. For example: | # echo function_graph > /sys/kernel/debug/tracing/current_tracer | # perf record -g -e raw_syscalls:sys_enter:k /bin/true | # perf report ... reports the callchain erroneously as: | el0t_64_sync | el0t_64_sync_handler | el0_svc_common.constprop.0 | perf_callchain | get_perf_callchain | syscall_trace_enter | syscall_trace_enter ... whereas when the function graph tracer is not in use, it reports: | el0t_64_sync | el0t_64_sync_handler | el0_svc | do_el0_svc | el0_svc_common.constprop.0 | syscall_trace_enter | syscall_trace_enter The underlying problem is that ftrace_graph_get_ret_stack() takes an index offset from the most recent entry added to the fgraph return stack. We start an unwind at offset 0, and increment the offset each time we encounter a rewritten return address (i.e. when we see `return_to_handler`). This is broken in two cases: 1) Between creating a pt_regs and starting the unwind, function calls may place entries on the stack, leaving an arbitrary offset which we can only determine by performing a full unwind from the caller of the unwind code (and relying on none of the unwind code being instrumented). This can result in erroneous entries being reported in a backtrace recorded by perf or kfence when the function graph tracer is in use. Currently show_regs() is unaffected as dump_backtrace() performs an initial unwind. 2) When unwinding across an exception boundary (whether continuing an unwind or starting a new unwind from regs), we currently always skip the LR of the interrupted context. Where this was live and contained a rewritten address, we won't consume the corresponding fgraph ret stack entry, leaving subsequent entries off-by-one. This can result in erroneous entries being reported in a backtrace performed by any in-kernel unwinder when that backtrace crosses an exception boundary, with entries after the boundary being reported incorrectly. This includes perf, kfence, show_regs(), panic(), etc. To fix this, we need to be able to uniquely identify each rewritten return address such that we can map this back to the original return address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate each rewritten return address with a unique location on the stack. As the return address is passed in the LR (and so is not guaranteed a unique location in memory), we use the FP upon entry to the function (i.e. the address of the caller's frame record) as the return address pointer. Any nested call will have a different FP value as the caller must create its own frame record and update FP to point to this. Since ftrace_graph_ret_addr() requires the return address with the PAC stripped, the stripping of the PAC is moved before the fixup of the rewritten address. As we would unconditionally strip the PAC, moving this earlier is not harmful, and we can avoid a redundant strip in the return address fixup code. I've tested this with the perf case above, the ftrace selftests, and a number of ad-hoc unwinder tests. The tests all pass, and I have seen no unexpected behaviour as a result of this change. I've tested with pointer authentication under QEMU TCG where magic-sysrq+l correctly recovers the original return addresses. Note that this doesn't fix the issue of skipping a live LR at an exception boundary, which is a more general problem and requires more substantial rework. Were we to consume the LR in all cases this would result in warnings where the interrupted context's LR contains `return_to_handler`, but the FP has been altered, e.g. | func: | <--- ftrace entry ---> // logs FP & LR, rewrites LR | STP FP, LR, [SP, #-16]! | MOV FP, SP | <--- INTERRUPT ---> ... as ftrace_graph_get_ret_stack() fill not find a matching entry, triggering the WARN_ON_ONCE() in unwind_frame(). Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-29 16:22:45 +00:00
orig_pc = ftrace_graph_ret_addr(tsk, NULL, frame->pc,
(void *)frame->fp);
if (WARN_ON_ONCE(frame->pc == orig_pc))
return -EINVAL;
arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph tracer is in use, unwind_frame() may erroneously associate a traced function with an incorrect return address. This can happen when starting an unwind from a pt_regs, or when unwinding across an exception boundary. This can be seen when recording with perf while the function graph tracer is in use. For example: | # echo function_graph > /sys/kernel/debug/tracing/current_tracer | # perf record -g -e raw_syscalls:sys_enter:k /bin/true | # perf report ... reports the callchain erroneously as: | el0t_64_sync | el0t_64_sync_handler | el0_svc_common.constprop.0 | perf_callchain | get_perf_callchain | syscall_trace_enter | syscall_trace_enter ... whereas when the function graph tracer is not in use, it reports: | el0t_64_sync | el0t_64_sync_handler | el0_svc | do_el0_svc | el0_svc_common.constprop.0 | syscall_trace_enter | syscall_trace_enter The underlying problem is that ftrace_graph_get_ret_stack() takes an index offset from the most recent entry added to the fgraph return stack. We start an unwind at offset 0, and increment the offset each time we encounter a rewritten return address (i.e. when we see `return_to_handler`). This is broken in two cases: 1) Between creating a pt_regs and starting the unwind, function calls may place entries on the stack, leaving an arbitrary offset which we can only determine by performing a full unwind from the caller of the unwind code (and relying on none of the unwind code being instrumented). This can result in erroneous entries being reported in a backtrace recorded by perf or kfence when the function graph tracer is in use. Currently show_regs() is unaffected as dump_backtrace() performs an initial unwind. 2) When unwinding across an exception boundary (whether continuing an unwind or starting a new unwind from regs), we currently always skip the LR of the interrupted context. Where this was live and contained a rewritten address, we won't consume the corresponding fgraph ret stack entry, leaving subsequent entries off-by-one. This can result in erroneous entries being reported in a backtrace performed by any in-kernel unwinder when that backtrace crosses an exception boundary, with entries after the boundary being reported incorrectly. This includes perf, kfence, show_regs(), panic(), etc. To fix this, we need to be able to uniquely identify each rewritten return address such that we can map this back to the original return address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate each rewritten return address with a unique location on the stack. As the return address is passed in the LR (and so is not guaranteed a unique location in memory), we use the FP upon entry to the function (i.e. the address of the caller's frame record) as the return address pointer. Any nested call will have a different FP value as the caller must create its own frame record and update FP to point to this. Since ftrace_graph_ret_addr() requires the return address with the PAC stripped, the stripping of the PAC is moved before the fixup of the rewritten address. As we would unconditionally strip the PAC, moving this earlier is not harmful, and we can avoid a redundant strip in the return address fixup code. I've tested this with the perf case above, the ftrace selftests, and a number of ad-hoc unwinder tests. The tests all pass, and I have seen no unexpected behaviour as a result of this change. I've tested with pointer authentication under QEMU TCG where magic-sysrq+l correctly recovers the original return addresses. Note that this doesn't fix the issue of skipping a live LR at an exception boundary, which is a more general problem and requires more substantial rework. Were we to consume the LR in all cases this would result in warnings where the interrupted context's LR contains `return_to_handler`, but the FP has been altered, e.g. | func: | <--- ftrace entry ---> // logs FP & LR, rewrites LR | STP FP, LR, [SP, #-16]! | MOV FP, SP | <--- INTERRUPT ---> ... as ftrace_graph_get_ret_stack() fill not find a matching entry, triggering the WARN_ON_ONCE() in unwind_frame(). Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-29 16:22:45 +00:00
frame->pc = orig_pc;
arm64: ftrace: fix a stack tracer's output under function graph tracer Function graph tracer modifies a return address (LR) in a stack frame to hook a function return. This will result in many useless entries (return_to_handler) showing up in a) a stack tracer's output b) perf call graph (with perf record -g) c) dump_backtrace (at panic et al.) For example, in case of a), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo 1 > /proc/sys/kernel/stack_trace_enabled $ cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (54 entries) ----- ---- -------- 0) 4504 16 gic_raise_softirq+0x28/0x150 1) 4488 80 smp_cross_call+0x38/0xb8 2) 4408 48 return_to_handler+0x0/0x40 3) 4360 32 return_to_handler+0x0/0x40 ... In case of b), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ perf record -e mem:XXX:x -ag -- sleep 10 $ perf report ... | | |--0.22%-- 0x550f8 | | | 0x10888 | | | el0_svc_naked | | | sys_openat | | | return_to_handler | | | return_to_handler ... In case of c), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo c > /proc/sysrq-trigger ... Call trace: [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 [<ffffffc000092250>] return_to_handler+0x0/0x40 [<ffffffc000092250>] return_to_handler+0x0/0x40 ... This patch replaces such entries with real addresses preserved in current->ret_stack[] at unwind_frame(). This way, we can cover all the cases. Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [will: fixed minor context changes conflicting with irq stack bits] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-15 08:33:41 +00:00
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
#ifdef CONFIG_KRETPROBES
if (is_kretprobe_trampoline(frame->pc))
frame->pc = kretprobe_find_ret_addr(tsk, (void *)frame->fp, &frame->kr_cur);
#endif
arm64: ftrace: fix a stack tracer's output under function graph tracer Function graph tracer modifies a return address (LR) in a stack frame to hook a function return. This will result in many useless entries (return_to_handler) showing up in a) a stack tracer's output b) perf call graph (with perf record -g) c) dump_backtrace (at panic et al.) For example, in case of a), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo 1 > /proc/sys/kernel/stack_trace_enabled $ cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (54 entries) ----- ---- -------- 0) 4504 16 gic_raise_softirq+0x28/0x150 1) 4488 80 smp_cross_call+0x38/0xb8 2) 4408 48 return_to_handler+0x0/0x40 3) 4360 32 return_to_handler+0x0/0x40 ... In case of b), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ perf record -e mem:XXX:x -ag -- sleep 10 $ perf report ... | | |--0.22%-- 0x550f8 | | | 0x10888 | | | el0_svc_naked | | | sys_openat | | | return_to_handler | | | return_to_handler ... In case of c), $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo c > /proc/sysrq-trigger ... Call trace: [<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30 [<ffffffc000092250>] return_to_handler+0x0/0x40 [<ffffffc000092250>] return_to_handler+0x0/0x40 ... This patch replaces such entries with real addresses preserved in current->ret_stack[] at unwind_frame(). This way, we can cover all the cases. Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [will: fixed minor context changes conflicting with irq stack bits] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-15 08:33:41 +00:00
return 0;
}
NOKPROBE_SYMBOL(unwind_frame);
void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
bool (*fn)(void *, unsigned long), void *data)
{
while (1) {
int ret;
if (!fn(data, frame->pc))
break;
ret = unwind_frame(tsk, frame);
if (ret < 0)
break;
}
}
NOKPROBE_SYMBOL(walk_stackframe);
static void dump_backtrace_entry(unsigned long where, const char *loglvl)
{
arm64: stacktrace: use %pSb for backtrace printing Let's use the new printk format to print the stacktrace entry when printing a backtrace to the kernel logs. This will include any module's build ID[1] in it so that offline/crash debugging can easily locate the debuginfo for a module via something like debuginfod[2]. Link: https://lkml.kernel.org/r/20210511003845.2429846-7-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sasha Levin <sashal@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 01:09:24 +00:00
printk("%s %pSb\n", loglvl, (void *)where);
}
void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl)
{
struct stackframe frame;
int skip = 0;
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
if (regs) {
if (user_mode(regs))
return;
skip = 1;
}
if (!tsk)
tsk = current;
if (!try_get_task_stack(tsk))
return;
if (tsk == current) {
start_backtrace(&frame,
(unsigned long)__builtin_frame_address(0),
(unsigned long)dump_backtrace);
} else {
/*
* task blocked in __switch_to
*/
start_backtrace(&frame,
thread_saved_fp(tsk),
thread_saved_pc(tsk));
}
printk("%sCall trace:\n", loglvl);
do {
/* skip until specified stack frame */
if (!skip) {
dump_backtrace_entry(frame.pc, loglvl);
} else if (frame.fp == regs->regs[29]) {
skip = 0;
/*
* Mostly, this is the case where this function is
* called in panic/abort. As exception handler's
* stack frame does not contain the corresponding pc
* at which an exception has taken place, use regs->pc
* instead.
*/
dump_backtrace_entry(regs->pc, loglvl);
}
} while (!unwind_frame(tsk, &frame));
put_task_stack(tsk);
}
void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
{
dump_backtrace(NULL, tsk, loglvl);
barrier();
}
#ifdef CONFIG_STACKTRACE
arm64: stacktrace: avoid tracing arch_stack_walk() When the function_graph tracer is in use, arch_stack_walk() may unwind the stack incorrectly, erroneously reporting itself, missing the final entry which is being traced, and reporting all traced entries between these off-by-one from where they should be. When ftrace hooks a function return, the original return address is saved to the fgraph ret_stack, and the return address in the LR (or the function's frame record) is replaced with `return_to_handler`. When arm64's unwinder encounter frames returning to `return_to_handler`, it finds the associated original return address from the fgraph ret stack, assuming the most recent `ret_to_hander` entry on the stack corresponds to the most recent entry in the fgraph ret stack, and so on. When arch_stack_walk() is used to dump the current task's stack, it starts from the caller of arch_stack_walk(). However, arch_stack_walk() can be traced, and so may push an entry on to the fgraph ret stack, leaving the fgraph ret stack offset by one from the expected position. This can be seen when dumping the stack via /proc/self/stack, where enabling the graph tracer results in an unexpected `stack_trace_save_tsk` entry at the start of the trace, and `el0_svc` missing form the end of the trace. This patch fixes this by marking arch_stack_walk() as notrace, as we do for all other functions on the path to ftrace_graph_get_ret_stack(). While a few helper functions are not marked notrace, their calls/returns are balanced, and will have no observable effect when examining the fgraph ret stack. It is possible for an exeption boundary to cause a similar offset if the return address of the interrupted context was in the LR. Fixing those cases will require some more substantial rework, and is left for subsequent patches. Before: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c | # echo function_graph > /sys/kernel/tracing/current_tracer | # cat /proc/self/stack | [<0>] stack_trace_save_tsk+0xa4/0x110 | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c After: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c | # echo function_graph > /sys/kernel/tracing/current_tracer | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c Cc: <stable@vger.kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviwed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210802164845.45506-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-02 16:48:45 +00:00
noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
arm64: stacktrace: don't trace arch_stack_walk() We recently converted arm64 to use arch_stack_walk() in commit: 5fc57df2f6fd ("arm64: stacktrace: Convert to ARCH_STACKWALK") The core stacktrace code expects that (when tracing the current task) arch_stack_walk() starts a trace at its caller, and does not include itself in the trace. However, arm64's arch_stack_walk() includes itself, and so traces include one more entry than callers expect. The core stacktrace code which calls arch_stack_walk() tries to skip a number of entries to prevent itself appearing in a trace, and the additional entry prevents skipping one of the core stacktrace functions, leaving this in the trace unexpectedly. We can fix this by having arm64's arch_stack_walk() begin the trace with its caller. The first value returned by the trace will be __builtin_return_address(0), i.e. the caller of arch_stack_walk(). The first frame record to be unwound will be __builtin_frame_address(1), i.e. the caller's frame record. To prevent surprises, arch_stack_walk() is also marked noinline. While __builtin_frame_address(1) is not safe in portable code, local GCC developers have confirmed that it is safe on arm64. To find the caller's frame record, the builtin can safely dereference the current function's frame record or (in theory) could stash the original FP into another GPR at function entry time, neither of which are problematic. Prior to this patch, the tracing code would unexpectedly show up in traces of the current task, e.g. | # cat /proc/self/stack | [<0>] stack_trace_save_tsk+0x98/0x100 | [<0>] proc_pid_stack+0xb4/0x130 | [<0>] proc_single_show+0x60/0x110 | [<0>] seq_read_iter+0x230/0x4d0 | [<0>] seq_read+0xdc/0x130 | [<0>] vfs_read+0xac/0x1e0 | [<0>] ksys_read+0x6c/0xfc | [<0>] __arm64_sys_read+0x20/0x30 | [<0>] el0_svc_common.constprop.0+0x60/0x120 | [<0>] do_el0_svc+0x24/0x90 | [<0>] el0_svc+0x2c/0x54 | [<0>] el0_sync_handler+0x1a4/0x1b0 | [<0>] el0_sync+0x170/0x180 After this patch, the tracing code will not show up in such traces: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xb4/0x130 | [<0>] proc_single_show+0x60/0x110 | [<0>] seq_read_iter+0x230/0x4d0 | [<0>] seq_read+0xdc/0x130 | [<0>] vfs_read+0xac/0x1e0 | [<0>] ksys_read+0x6c/0xfc | [<0>] __arm64_sys_read+0x20/0x30 | [<0>] el0_svc_common.constprop.0+0x60/0x120 | [<0>] do_el0_svc+0x24/0x90 | [<0>] el0_svc+0x2c/0x54 | [<0>] el0_sync_handler+0x1a4/0x1b0 | [<0>] el0_sync+0x170/0x180 Erring on the side of caution, I've given this a spin with a bunch of toolchains, verifying the output of /proc/self/stack and checking that the assembly looked sound. For GCC (where we require version 5.1.0 or later) I tested with the kernel.org crosstool binares for versions 5.5.0, 6.4.0, 6.5.0, 7.3.0, 7.5.0, 8.1.0, 8.3.0, 8.4.0, 9.2.0, and 10.1.0. For clang (where we require version 10.0.1 or later) I tested with the llvm.org binary releases of 11.0.0, and 11.0.1. Fixes: 5fc57df2f6fd ("arm64: stacktrace: Convert to ARCH_STACKWALK") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jun <chenjun102@huawei.com> Cc: Marco Elver <elver@google.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> # 5.10.x Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210319184106.5688-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-03-19 18:41:06 +00:00
void *cookie, struct task_struct *task,
struct pt_regs *regs)
{
struct stackframe frame;
if (regs)
start_backtrace(&frame, regs->regs[29], regs->pc);
else if (task == current)
start_backtrace(&frame,
arm64: stacktrace: don't trace arch_stack_walk() We recently converted arm64 to use arch_stack_walk() in commit: 5fc57df2f6fd ("arm64: stacktrace: Convert to ARCH_STACKWALK") The core stacktrace code expects that (when tracing the current task) arch_stack_walk() starts a trace at its caller, and does not include itself in the trace. However, arm64's arch_stack_walk() includes itself, and so traces include one more entry than callers expect. The core stacktrace code which calls arch_stack_walk() tries to skip a number of entries to prevent itself appearing in a trace, and the additional entry prevents skipping one of the core stacktrace functions, leaving this in the trace unexpectedly. We can fix this by having arm64's arch_stack_walk() begin the trace with its caller. The first value returned by the trace will be __builtin_return_address(0), i.e. the caller of arch_stack_walk(). The first frame record to be unwound will be __builtin_frame_address(1), i.e. the caller's frame record. To prevent surprises, arch_stack_walk() is also marked noinline. While __builtin_frame_address(1) is not safe in portable code, local GCC developers have confirmed that it is safe on arm64. To find the caller's frame record, the builtin can safely dereference the current function's frame record or (in theory) could stash the original FP into another GPR at function entry time, neither of which are problematic. Prior to this patch, the tracing code would unexpectedly show up in traces of the current task, e.g. | # cat /proc/self/stack | [<0>] stack_trace_save_tsk+0x98/0x100 | [<0>] proc_pid_stack+0xb4/0x130 | [<0>] proc_single_show+0x60/0x110 | [<0>] seq_read_iter+0x230/0x4d0 | [<0>] seq_read+0xdc/0x130 | [<0>] vfs_read+0xac/0x1e0 | [<0>] ksys_read+0x6c/0xfc | [<0>] __arm64_sys_read+0x20/0x30 | [<0>] el0_svc_common.constprop.0+0x60/0x120 | [<0>] do_el0_svc+0x24/0x90 | [<0>] el0_svc+0x2c/0x54 | [<0>] el0_sync_handler+0x1a4/0x1b0 | [<0>] el0_sync+0x170/0x180 After this patch, the tracing code will not show up in such traces: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xb4/0x130 | [<0>] proc_single_show+0x60/0x110 | [<0>] seq_read_iter+0x230/0x4d0 | [<0>] seq_read+0xdc/0x130 | [<0>] vfs_read+0xac/0x1e0 | [<0>] ksys_read+0x6c/0xfc | [<0>] __arm64_sys_read+0x20/0x30 | [<0>] el0_svc_common.constprop.0+0x60/0x120 | [<0>] do_el0_svc+0x24/0x90 | [<0>] el0_svc+0x2c/0x54 | [<0>] el0_sync_handler+0x1a4/0x1b0 | [<0>] el0_sync+0x170/0x180 Erring on the side of caution, I've given this a spin with a bunch of toolchains, verifying the output of /proc/self/stack and checking that the assembly looked sound. For GCC (where we require version 5.1.0 or later) I tested with the kernel.org crosstool binares for versions 5.5.0, 6.4.0, 6.5.0, 7.3.0, 7.5.0, 8.1.0, 8.3.0, 8.4.0, 9.2.0, and 10.1.0. For clang (where we require version 10.0.1 or later) I tested with the llvm.org binary releases of 11.0.0, and 11.0.1. Fixes: 5fc57df2f6fd ("arm64: stacktrace: Convert to ARCH_STACKWALK") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jun <chenjun102@huawei.com> Cc: Marco Elver <elver@google.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> # 5.10.x Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210319184106.5688-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-03-19 18:41:06 +00:00
(unsigned long)__builtin_frame_address(1),
(unsigned long)__builtin_return_address(0));
else
start_backtrace(&frame, thread_saved_fp(task),
thread_saved_pc(task));
walk_stackframe(task, &frame, consume_entry, cookie);
}
#endif