linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-11-01 00:48:50 +00:00

Author	SHA1	Message	Date
Puranjay Mohan	c060f93253	arm64: stacktrace: fix the usage of ftrace_graph_ret_addr() ftrace_graph_ret_addr() takes an 'idx' integer pointer that is used to optimize the stack unwinding process. arm64 currently passes `NULL` for this parameter which stops it from utilizing these optimizations. Further, the current code for ftrace_graph_ret_addr() will just return the passed in return address if it is NULL which will break this usage. Pass a valid integer pointer to ftrace_graph_ret_addr() similar to x86_64's stack unwinder. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Fixes: `29c1c24a27` ("function_graph: Fix up ftrace_graph_ret_addr()") Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240618162342.28275-1-puranjay@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2024-09-05 15:03:35 +01:00
chenqiwu	410e471f87	arm64: Add USER_STACKTRACE support Currently, userstacktrace is unsupported for ftrace and uprobe tracers on arm64. This patch uses the perf_callchain_user() code as blueprint to implement the arch_stack_walk_user() which add userstacktrace support on arm64. Meanwhile, we can use arch_stack_walk_user() to simplify the implementation of perf_callchain_user(). This patch is tested pass with ftrace, uprobe and perf tracers profiling userstacktrace cases. Tested-by: chenqiwu <qiwu.chen@transsion.com> Signed-off-by: chenqiwu <qiwu.chen@transsion.com> Link: https://lore.kernel.org/r/20231219022229.10230-1-qiwu.chen@transsion.com Signed-off-by: Will Deacon <will@kernel.org>	2024-05-03 14:12:45 +01:00
Linus Torvalds	9187210eee	Networking changes for 6.9. Core & protocols ---------------- - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc.) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code -------------------------------------------- - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter --------- - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF --- - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless -------- - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API ---------- - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc ---- - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers ------- - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXv0mgACgkQMUZtbf5S IrtgMxAAuRd+WJW++SENr4KxIWhYO1q6Xcxnai43wrNkan9swD24icG8TYALt4f3 yoT6idQvWReAb5JNlh9rUQz8R7E0nJXlvEFn5MtJwcthx2C6wFo/XkJlddlRrT+j c2xGILwLjRhW65LaC0MZ2ECbEERkFz8xcGfK2SWzUgh6KYvPjcRfKFxugpM7xOQK P/Wnqhs4fVRS/Mj/bCcXcO+yhwC121Q3qVeQVjGS0AzEC65hAW87a/kc2BfgcegD EyI9R7mf6criQwX+0awubjfoIdr4oW/8oDVNvUDczkJkbaEVaLMQk9P5x/0XnnVS UHUchWXyI80Q8Rj12uN1/I0h3WtwNQnCRBuLSmtm6GLfCAwbLvp2nGWDnaXiqryW DVKUIHGvqPKjkOOMOVfSvfB3LvkS3xsFVVYiQBQCn0YSs/gtu4CoF2Nty9CiLPbK tTuxUnLdPDZDxU//l0VArZmP8p2JM7XQGJ+JH8GFH4SBTyBR23e0iyPSoyaxjnYn RReDnHMVsrS1i7GPhbqDJWn+uqMSs7N149i0XmmyeqwQHUVSJN3J2BApP2nCaDfy H2lTuYly5FfEezt61NvCE4qr/VsWeEjm1fYlFQ9dFn4pGn+HghyCpw+xD1ZN56DN lujemau5B3kk1UTtAT4ypPqvuqjkRFqpNV2LzsJSk/Js+hApw8Y= =oY52 -----END PGP SIGNATURE----- Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...	2024-03-12 17:44:08 -07:00
Puranjay Mohan	2c79bd34af	arm64: prohibit probing on arch_kunwind_consume_entry() Make arch_kunwind_consume_entry() as __always_inline otherwise the compiler might not inline it and allow attaching probes to it. Without this, just probing arch_kunwind_consume_entry() via <tracefs>/kprobe_events will crash the kernel on arm64. The crash can be reproduced using the following compiler and kernel combination: clang version 19.0.0git (https://github.com/llvm/llvm-project.git d68d29516102252f6bf6dc23fb22cef144ca1cb3) commit `87adedeba5` ("Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") [root@localhost ~]# echo 'p arch_kunwind_consume_entry' > /sys/kernel/debug/tracing/kprobe_events [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable Modules linked in: aes_ce_blk aes_ce_cipher ghash_ce sha2_ce virtio_net sha256_arm64 sha1_ce arm_smccc_trng net_failover failover virtio_mmio uio_pdrv_genirq uio sch_fq_codel dm_mod dax configfs CPU: 3 PID: 1405 Comm: bash Not tainted 6.8.0-rc6+ #14 Hardware name: linux,dummy-virt (DT) pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kprobe_breakpoint_handler+0x17c/0x258 lr : kprobe_breakpoint_handler+0x17c/0x258 sp : ffff800085d6ab60 x29: ffff800085d6ab60 x28: ffff0000066f0040 x27: ffff0000066f0b20 x26: ffff800081fa7b0c x25: 0000000000000002 x24: ffff00000b29bd18 x23: ffff00007904c590 x22: ffff800081fa6590 x21: ffff800081fa6588 x20: ffff00000b29bd18 x19: ffff800085d6ac40 x18: 0000000000000079 x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000004 x14: ffff80008277a940 x13: 0000000000000003 x12: 0000000000000003 x11: 00000000fffeffff x10: c0000000fffeffff x9 : aa95616fdf80cc00 x8 : aa95616fdf80cc00 x7 : 205d343137373231 x6 : ffff800080fb48ec x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff800085d6a910 x0 : 0000000000000079 Call trace: kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40 Fixes: `1aba06e7b2` ("arm64: stacktrace: factor out kunwind_stack_walk()") Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240229231620.24846-1-puranjay12@gmail.com Signed-off-by: Will Deacon <will@kernel.org>	2024-03-04 13:00:00 +00:00
Puranjay Mohan	e74cb1b422	arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT This will be used by bpf_throw() to unwind till the program marked as exception boundary and run the callback with the stack of the main program. This is required for supporting BPF exceptions on ARM64. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240201125225.72796-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-27 13:54:17 -08:00
Mark Rutland	1aba06e7b2	arm64: stacktrace: factor out kunwind_stack_walk() Currently arm64 uses the generic arch_stack_walk() interface for all stack walking code. This only passes a PC value and cookie to the unwind callback, whereas we'd like to pass some additional information in some cases. For example, the BPF exception unwinder wants the FP, for reliable stacktrace we'll want to perform additional checks on other portions of unwind state, and we'd like to expand the information printed by dump_backtrace() to include provenance and reliability information. As preparation for all of the above, this patch factors the core unwind logic out of arch_stack_walk() and into a new kunwind_stack_walk() function which provides all of the unwind state to a callback function. The existing arch_stack_walk() interface is implemented atop this. The kunwind_stack_walk() function is intended to be a private implementation detail of unwinders in stacktrace.c, and not something to be exported generally to kernel code. It is __always_inline'd into its caller so that neither it or its caller appear in stactraces (which is the existing/required behavior for arch_stack_walk() and friends) and so that the compiler can optimize away some of the indirection. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Puranjay Mohan <puranjay12@gmail.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231124110511.2795958-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2023-12-11 11:42:55 +00:00
Mark Rutland	1beef60e7d	arm64: stacktrace: factor out kernel unwind state On arm64 we share some unwinding code between the regular kernel unwinder and the KVM hyp unwinder. Some of this common code only matters to the regular unwinder, e.g. the `kr_cur` and `task` fields in the common struct unwind_state. We're likely to add more state which only matters for regular kernel unwinding (or only for hyp unwinding). In preparation for such changes, this patch factors out the kernel-specific state into a new struct kunwind_state, and updates the kernel unwind code accordingly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Puranjay Mohan <puranjay12@gmail.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231124110511.2795958-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2023-12-11 11:42:55 +00:00
Mark Rutland	ca708599ca	arm64: use XPACLRI to strip PAC Currently we strip the PAC from pointers using C code, which requires generating bitmasks, and conditionally clearing/setting bits depending on bit 55. We can do better by using XPACLRI directly. When the logic was originally written to strip PACs from user pointers, contemporary toolchains used for the kernel had assemblers which were unaware of the PAC instructions. As stripping the PAC from userspace pointers required unconditional clearing of a fixed set of bits (which could be performed with a single instruction), it was simpler to implement the masking in C than it was to make use of XPACI or XPACLRI. When support for in-kernel pointer authentication was added, the stripping logic was extended to cover TTBR1 pointers, requiring several instructions to handle whether to clear/set bits dependent on bit 55 of the pointer. This patch simplifies the stripping of PACs by using XPACLRI directly, as contemporary toolchains do within __builtin_return_address(). This saves a number of instructions, especially where __builtin_return_address() does not implicitly strip the PAC but is heavily used (e.g. with tracepoints). As the kernel might be compiled with an assembler without knowledge of XPACLRI, it is assembled using the 'HINT #7' alias, which results in an identical opcode. At the same time, I've split ptrauth_strip_insn_pac() into ptrauth_strip_user_insn_pac() and ptrauth_strip_kernel_insn_pac() helpers so that we can avoid unnecessary PAC stripping when pointer authentication is not in use in userspace or kernel respectively. The underlying xpaclri() macro uses inline assembly which clobbers x30. The clobber causes the compiler to save/restore the original x30 value in a frame record (protected with PACIASP and AUTIASP when in-kernel authentication is enabled), so this does not provide a gadget to alter the return address. Similarly this does not adversely affect unwinding due to the presence of the frame record. The ptrauth_user_pac_mask() and ptrauth_kernel_pac_mask() are exported from the kernel in ptrace and core dumps, so these are retained. A subsequent patch will move them out of <asm/compiler.h>. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230412160134.306148-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2023-04-13 12:27:11 +01:00
Mark Rutland	b5ecc19e68	arm64: stacktrace: always inline core stacktrace functions The arm64 stacktrace code can be used in kprobe context, and so cannot be safely probed. Some (but not all) of the unwind functions are annotated with `NOKPROBE_SYMBOL()` to ensure this, with others markes as `__always_inline`, relying on the top-level unwind function being marked as `noinstr`. This patch has stacktrace.c consistently mark the internal stacktrace functions as `__always_inline`, removing the need for NOKPROBE_SYMBOL() as the top-level unwind function (arch_stack_walk()) is marked as `noinstr`. This is more consistent and is a simpler pattern to follow for future additions to stacktrace.c. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230411162943.203199-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2023-04-11 18:34:59 +01:00
Mark Rutland	ead6122c28	arm64: stacktrace: move dump functions to end of file For historical reasons, the backtrace dumping functions are placed in the middle of stacktrace.c, despite using functions defined later. For clarity, and to make subsequent refactoring easier, move the dumping functions to the end of stacktrace.c There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230411162943.203199-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2023-04-11 18:34:59 +01:00
Mark Rutland	9e09d445f1	arm64: stacktrace: recover return address for first entry The function which calls the top-level backtracing function may have been instrumented with ftrace and/or kprobes, and hence the first return address may have been rewritten. Factor out the existing fgraph / kretprobes address recovery, and use this for the first address. As the comment for the fgraph case isn't all that helpful, I've also dropped that. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230411162943.203199-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2023-04-11 18:34:59 +01:00
Ard Biesheuvel	7ea55715c4	arm64: efi: Account for the EFI runtime stack in stack unwinder The EFI runtime services run from a dedicated stack now, and so the stack unwinder needs to be informed about this. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2023-01-16 15:27:31 +01:00
Masami Hiramatsu (Google)	0fbcd8abf3	arm64: Prohibit instrumentation on arch_stack_walk() Mark arch_stack_walk() as noinstr instead of notrace and inline functions called from arch_stack_walk() as __always_inline so that user does not put any instrumentations on it, because this function can be used from return_address() which is used by lockdep. Without this, if the kernel built with CONFIG_LOCKDEP=y, just probing arch_stack_walk() via <tracefs>/kprobe_events will crash the kernel on arm64. # echo p arch_stack_walk >> ${TRACEFS}/kprobe_events # echo 1 > ${TRACEFS}/events/kprobes/enable kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! PREEMPT SMP Modules linked in: CPU: 0 PID: 17 Comm: migration/0 Tainted: G N 6.1.0-rc5+ #6 Hardware name: linux,dummy-virt (DT) Stopper: 0x0 <- 0x0 pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kprobe_breakpoint_handler+0x178/0x17c lr : kprobe_breakpoint_handler+0x178/0x17c sp : ffff8000080d3090 x29: ffff8000080d3090 x28: ffff0df5845798c0 x27: ffffc4f59057a774 x26: ffff0df5ffbba770 x25: ffff0df58f420f18 x24: ffff49006f641000 x23: ffffc4f590579768 x22: ffff0df58f420f18 x21: ffff8000080d31c0 x20: ffffc4f590579768 x19: ffffc4f590579770 x18: 0000000000000006 x17: 5f6b636174735f68 x16: 637261203d207264 x15: 64612e202c30203d x14: 2074657366666f2e x13: 30633178302f3078 x12: 302b6b6c61775f6b x11: 636174735f686372 x10: ffffc4f590dc5bd8 x9 : ffffc4f58eb31958 x8 : 00000000ffffefff x7 : ffffc4f590dc5bd8 x6 : 80000000fffff000 x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff0df5845798c0 x0 : 0000000000000064 Call trace: kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! Fixes: `39ef362d2d` ("arm64: Make return_address() use arch_stack_walk()") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/166994751368.439920.3236636557520824664.stgit@devnote3 Signed-off-by: Will Deacon <will@kernel.org>	2022-12-05 14:20:08 +00:00
Mark Rutland	4b5e694e25	arm64: stacktrace: track hyp stacks in unwinder's address space Currently unwind_next_frame_record() has an optional callback to convert the address space of the FP. This is necessary for the NVHE unwinder, which tracks the stacks in the hyp VA space, but accesses the frame records in the kernel VA space. This is a bit unfortunate since it clutters unwind_next_frame_record(), which will get in the way of future rework. Instead, this patch changes the NVHE unwinder to track the stacks in the kernel's VA space and translate to FP prior to calling unwind_next_frame_record(). This removes the need for the translate_fp() callback, as all unwinders consistently track stacks in the native address space of the unwinder. At the same time, this patch consolidates the generation of the stack addresses behind the stackinfo_get_*() helpers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-09 12:30:08 +01:00
Mark Rutland	8df137300d	arm64: stacktrace: track all stack boundaries explicitly Currently we call an on_accessible_stack() callback for each step of the unwinder, requiring redundant work to be performed in the core of the unwind loop (e.g. disabling preemption around accesses to per-cpu variables containing stack boundaries). To prevent unwind loops which go through a stack multiple times, we have to track the set of unwound stacks, requiring a stack_type enum which needs to cater for all the stacks of all possible callees. To prevent loops within a stack, we must track the prior FP values. This patch reworks the unwinder to minimize the work in the core of the unwinder, and to remove the need for the stack_type enum. The set of accessible stacks (and their boundaries) are determined at the start of the unwind, and the current stack is tracked during the unwind, with completed stacks removed from the set of accessible stacks. This makes the boundary checks more accurate (e.g. detecting overlapped frame records), and removes the need for separate tracking of the prior FP and visited stacks. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-09 12:30:08 +01:00
Mark Rutland	d1f684e46b	arm64: stacktrace: rework stack boundary discovery In subsequent patches we'll want to acquire the stack boundaries ahead-of-time, and we'll need to be able to acquire the relevant stack_info regardless of whether we have an object the happens to be on the stack. This patch replaces the on_XXX_stack() helpers with stackinfo_get_XXX() helpers, with the caller being responsible for the checking whether an object is on a relevant stack. For the moment this is moved into the on_accessible_stack() functions, making these slightly larger; subsequent patches will remove the on_accessible_stack() functions and simplify the logic. The on_irq_stack() and on_task_stack() helpers are kept as these are used by IRQ entry sequences and stackleak respectively. As they're only used as predicates, the stack_info pointer parameter is removed in both cases. As the on_accessible_stack() functions are always passed a non-NULL info pointer, these now update info unconditionally. When updating the type to STACK_TYPE_UNKNOWN, the low/high bounds are also modified, but as these will not be consumed this should have no adverse affect. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-09 12:30:07 +01:00
Mark Rutland	75758d5114	arm64: stacktrace: move SDEI stack helpers to stacktrace code For clarity and ease of maintenance, it would be helpful for all the stack helpers to be in the same place. Move the SDEI stack helpers into the stacktrace code where all the other stack helpers live. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-09 12:30:07 +01:00
Mark Rutland	b532ab5f23	arm64: stacktrace: rename unwind_next_common() -> unwind_next_frame_record() The unwind_next_common() function unwinds a single frame record. There are other unwind steps (e.g. unwinding through trampolines) which are handled in the regular kernel unwinder, and in future there may be other common unwind helpers. Clarify the purpose of unwind_next_common() by renaming it to unwind_next_frame_record(). At the same time, add commentary, and delete the redundant comment at the top of asm/stacktrace/common.h. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-09 12:30:07 +01:00
Mark Rutland	bc8d75212d	arm64: stacktrace: simplify unwind_next_common() Currently unwind_next_common() takes a pointer to a stack_info which is only ever used within unwind_next_common(). Make it a local variable and simplify callers. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-09-09 12:30:07 +01:00
Marc Zyngier	4e00532f37	KVM: arm64: Make unwind()/on_accessible_stack() per-unwinder functions Having multiple versions of on_accessible_stack() (one per unwinder) makes it very hard to reason about what is used where due to the complexity of the various includes, the forward declarations, and the reliance on everything being 'inline'. Instead, move the code back where it should be. Each unwinder implements: - on_accessible_stack() as well as the helpers it depends on, - unwind()/unwind_next(), as they pass on_accessible_stack as a parameter to unwind_next_common() (which is the only common code here) This hardly results in any duplication, and makes it much easier to reason about the code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Tested-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20220727142906.1856759-4-maz@kernel.org	2022-07-27 18:18:03 +01:00
Kalesh Singh	f51e714674	arm64: stacktrace: Factor out common unwind() Move unwind() to stacktrace/common.h, and as a result the kernel unwind_next() to asm/stacktrace.h. This allow reusing unwind() in the implementation of the nVHE HYP stack unwinder, later in the series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-6-kaleshsingh@google.com	2022-07-26 10:48:43 +01:00
Kalesh Singh	5b1b08619f	arm64: stacktrace: Handle frame pointer from different address spaces The unwinder code is made reusable so that it can be used to unwind various types of stacks. One usecase is unwinding the nVHE hyp stack from the host (EL1) in non-protected mode. This means that the unwinder must be able to translate HYP stack addresses to kernel addresses. Add a callback (stack_trace_translate_fp_fn) to allow specifying the translation function. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-5-kaleshsingh@google.com	2022-07-26 10:48:32 +01:00
Kalesh Singh	be63c647fd	arm64: stacktrace: Factor out unwind_next_common() Move common unwind_next logic to stacktrace/common.h. This allows reusing the code in the implementation the nVHE hypervisor stack unwinder, later in this series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-4-kaleshsingh@google.com	2022-07-26 10:48:20 +01:00
Kalesh Singh	6bf212c89c	arm64: stacktrace: Add shared header for common stack unwinding code In order to reuse the arm64 stack unwinding logic for the nVHE hypervisor stack, move the common code to a shared header (arch/arm64/include/asm/stacktrace/common.h). The nVHE hypervisor cannot safely link against kernel code, so we make use of the shared header to avoid duplicated logic later in this series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-2-kaleshsingh@google.com	2022-07-26 10:47:14 +01:00
Madhavan T. Venkataraman	82a592c13b	arm64: Copy the task argument to unwind_state Copy the task argument passed to arch_stack_walk() to unwind_state so that it can be passed to unwind functions via unwind_state rather than as a separate argument. The task is a fundamental part of the unwind state. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220617180219.20352-3-madvenka@linux.microsoft.com Signed-off-by: Will Deacon <will@kernel.org>	2022-06-27 10:51:34 +01:00
Madhavan T. Venkataraman	a019d8a2cc	arm64: Split unwind_init() unwind_init() is currently a single function that initializes all of the unwind state. Split it into the following functions and call them appropriately: - unwind_init_from_regs() - initialize from regs passed by caller. - unwind_init_from_caller() - initialize for the current task from the caller of arch_stack_walk(). - unwind_init_from_task() - initialize from the saved state of a task other than the current task. In this case, the other task must not be running. This is done for two reasons: - the different ways of initializing are clear - specialized code can be added to each initializer in the future. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220617180219.20352-2-madvenka@linux.microsoft.com Signed-off-by: Will Deacon <will@kernel.org>	2022-06-27 10:51:34 +01:00
Andrey Konovalov	446297b28a	arm64: stacktrace: use non-atomic __set_bit Use the non-atomic version of set_bit() in arch/arm64/kernel/stacktrace.c, as there is no concurrent accesses to frame->prev_type. This speeds up stack trace collection and improves the boot time of Generic KASAN by 2-5%. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Link: https://lore.kernel.org/r/23dfa36d1cc91e4a1059945b7834eac22fb9854d.1653317461.git.andreyknvl@google.com Signed-off-by: Will Deacon <will@kernel.org>	2022-06-23 15:57:29 +01:00
Andrey Konovalov	802b91118d	arm64: kasan: do not instrument stacktrace.c Disable KASAN instrumentation of arch/arm64/kernel/stacktrace.c. This speeds up Generic KASAN by 5-20%. As a side-effect, KASAN is now unable to detect bugs in the stack trace collection code. This is taken as an acceptable downside. Also replace READ_ONCE_NOCHECK() with READ_ONCE() in stacktrace.c. As the file is now not instrumented, there is no need to use the NOCHECK version of READ_ONCE(). Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Link: https://lore.kernel.org/r/c4c944a2a905e949760fbeb29258185087171708.1653317461.git.andreyknvl@google.com Signed-off-by: Will Deacon <will@kernel.org>	2022-06-23 15:57:29 +01:00
Madhavan T. Venkataraman	bd5552bc48	arm64: stacktrace: align with common naming For historical reasons, the naming of parameters and their types in the arm64 stacktrace code differs from that used in generic code and other architectures, even though the types are equivalent. For consistency and clarity, use the generic names. There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-04-22 15:33:14 +01:00
Madhavan T. Venkataraman	e9d75a0ba8	arm64: stacktrace: rename stackframe to unwind_state Rename "struct stackframe" to "struct unwind_state" for consistency and better naming. Accordingly, rename variable/argument "frame" to "state". There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-04-22 15:33:14 +01:00
Madhavan T. Venkataraman	c797bd4548	arm64: stacktrace: rename unwinder functions Rename unwinder functions for consistency and better naming. - Rename start_backtrace() to unwind_init(). - Rename unwind_frame() to unwind_next(). - Rename walk_stackframe() to unwind(). There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-04-22 15:33:14 +01:00
Mark Rutland	96bb1530c4	arm64: stacktrace: make struct stackframe private to stacktrace.c Now that arm64 uses arch_stack_walk() consistently, struct stackframe is only used within stacktrace.c. To make it easier to read and maintain this code, it would be nicer if the definition were there too. Move the definition into stacktrace.c. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviwed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-04-22 15:33:13 +01:00
Mark Rutland	cb86a41b35	arm64: stacktrace: delete PCS comment The comment at the top of stacktrace.c isn't all that helpful, as it's not associated with the code which inspects the frame record, and the code example isn't representative of common code generation today. Delete it. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-04-22 15:33:13 +01:00
Madhavan T. Venkataraman	4f6277e8ac	arm64: stacktrace: remove NULL task check from unwind_frame() Currently, there is a check for a NULL task in unwind_frame(). It is not needed since all current callers pass a non-NULL task. There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-04-22 15:33:13 +01:00
Masami Hiramatsu	1e0924bd09	arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL Mark the start_backtrace() as notrace and NOKPROBE_SYMBOL because this function is called from ftrace and lockdep to get the caller address via return_address(). The lockdep is used in kprobes, it should also be NOKPROBE_SYMBOL. Fixes: `b07f349966` ("arm64: stacktrace: Move start_backtrace() out of the header") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/164301227374.1433152.12808232644267107415.stgit@devnote2 Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2022-01-24 14:25:20 +00:00
Mark Rutland	d2d1d2645c	arm64: Make some stacktrace functions private Now that open-coded stack unwinds have been converted to arch_stack_walk(), we no longer need to expose any of unwind_frame(), walk_stackframe(), or start_backtrace() outside of stacktrace.c. Make those functions private to stacktrace.c, removing their prototypes from <asm/stacktrace.h> and marking them static. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-12-10 14:06:04 +00:00
Madhavan T. Venkataraman	2dad6dc17b	arm64: Make dump_backtrace() use arch_stack_walk() To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic. Currently, dump_backtrace() walks the stack of the current task or a blocked task by calling stact_backtrace() and iterating unwind steps using unwind_frame(). This can be written more simply in terms of arch_stack_walk(), considering three distinct cases: 1) When unwinding a blocked task, start_backtrace() is called with the blocked task's saved PC and FP, and the unwind proceeds immediately from this point without skipping any entries. This is functionally equivalent to calling arch_stack_walk() with the blocked task, which will start with the task's saved PC and FP. There is no functional change to this case. 2) When unwinding the current task without regs, start_backtrace() is called with dump_backtrace() as the PC and __builtin_frame_address(0) as the next frame, and the unwind proceeds immediately without skipping. This is almost functionally equivalent to calling arch_stack_walk() for the current task, which will start with its caller (i.e. an offset into dump_backtrace()) as the PC, and the callers frame record as the next frame. The only difference being that dump_backtrace() will be reported with an offset (which is strictly more correct than currently). Otherwise there is no functional cahnge to this case. 3) When unwinding the current task with regs, start_backtrace() is called with dump_backtrace() as the PC and __builtin_frame_address(0) as the next frame, and the unwind is performed silently until the next frame is the frame pointed to by regs->fp. Reporting starts from regs->pc and continues from the frame in regs->fp. Historically, this pre-unwind was necessary to correctly record return addresses rewritten by the ftrace graph calller, but this is no longer necessary as these are now recovered using the FP since commit: `c6d3cd32fd` ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR") This pre-unwind is not necessary to recover return addresses rewritten by kretprobes, which historically were not recovered, and are now recovered using the FP since commit: `cd9bc2c925` ("arm64: Recover kretprobe modified return address in stacktrace") Thus, this is functionally equivalent to calling arch_stack_walk() with the current task and regs, which will start with regs->pc as the PC and regs->fp as the next frame, without a pre-unwind. This patch makes dump_backtrace() use arch_stack_walk(). This simplifies dump_backtrace() and will permit subsequent changes to the unwind code. Aside from the improved reporting when unwinding current without regs, there should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> [Mark: elaborate commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-12-10 14:06:04 +00:00
Peter Zijlstra	1614b2b11f	arch: Make ARCH_STACKWALK independent of STACKTRACE Make arch_stack_walk() available for ARCH_STACKWALK architectures without it being entangled in STACKTRACE. Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Mark: rebase, drop unnecessary arm change] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-12-10 14:06:03 +00:00
Mark Rutland	c6d3cd32fd	arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph tracer is in use, unwind_frame() may erroneously associate a traced function with an incorrect return address. This can happen when starting an unwind from a pt_regs, or when unwinding across an exception boundary. This can be seen when recording with perf while the function graph tracer is in use. For example: \| # echo function_graph > /sys/kernel/debug/tracing/current_tracer \| # perf record -g -e raw_syscalls:sys_enter:k /bin/true \| # perf report ... reports the callchain erroneously as: \| el0t_64_sync \| el0t_64_sync_handler \| el0_svc_common.constprop.0 \| perf_callchain \| get_perf_callchain \| syscall_trace_enter \| syscall_trace_enter ... whereas when the function graph tracer is not in use, it reports: \| el0t_64_sync \| el0t_64_sync_handler \| el0_svc \| do_el0_svc \| el0_svc_common.constprop.0 \| syscall_trace_enter \| syscall_trace_enter The underlying problem is that ftrace_graph_get_ret_stack() takes an index offset from the most recent entry added to the fgraph return stack. We start an unwind at offset 0, and increment the offset each time we encounter a rewritten return address (i.e. when we see `return_to_handler`). This is broken in two cases: 1) Between creating a pt_regs and starting the unwind, function calls may place entries on the stack, leaving an arbitrary offset which we can only determine by performing a full unwind from the caller of the unwind code (and relying on none of the unwind code being instrumented). This can result in erroneous entries being reported in a backtrace recorded by perf or kfence when the function graph tracer is in use. Currently show_regs() is unaffected as dump_backtrace() performs an initial unwind. 2) When unwinding across an exception boundary (whether continuing an unwind or starting a new unwind from regs), we currently always skip the LR of the interrupted context. Where this was live and contained a rewritten address, we won't consume the corresponding fgraph ret stack entry, leaving subsequent entries off-by-one. This can result in erroneous entries being reported in a backtrace performed by any in-kernel unwinder when that backtrace crosses an exception boundary, with entries after the boundary being reported incorrectly. This includes perf, kfence, show_regs(), panic(), etc. To fix this, we need to be able to uniquely identify each rewritten return address such that we can map this back to the original return address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate each rewritten return address with a unique location on the stack. As the return address is passed in the LR (and so is not guaranteed a unique location in memory), we use the FP upon entry to the function (i.e. the address of the caller's frame record) as the return address pointer. Any nested call will have a different FP value as the caller must create its own frame record and update FP to point to this. Since ftrace_graph_ret_addr() requires the return address with the PAC stripped, the stripping of the PAC is moved before the fixup of the rewritten address. As we would unconditionally strip the PAC, moving this earlier is not harmful, and we can avoid a redundant strip in the return address fixup code. I've tested this with the perf case above, the ftrace selftests, and a number of ad-hoc unwinder tests. The tests all pass, and I have seen no unexpected behaviour as a result of this change. I've tested with pointer authentication under QEMU TCG where magic-sysrq+l correctly recovers the original return addresses. Note that this doesn't fix the issue of skipping a live LR at an exception boundary, which is a more general problem and requires more substantial rework. Were we to consume the LR in all cases this would result in warnings where the interrupted context's LR contains `return_to_handler`, but the FP has been altered, e.g. \| func: \| <--- ftrace entry ---> // logs FP & LR, rewrites LR \| STP FP, LR, [SP, #-16]! \| MOV FP, SP \| <--- INTERRUPT ---> ... as ftrace_graph_get_ret_stack() fill not find a matching entry, triggering the WARN_ON_ONCE() in unwind_frame(). Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2021-11-16 09:47:54 +00:00
Masami Hiramatsu	cd9bc2c925	arm64: Recover kretprobe modified return address in stacktrace Since the kretprobe replaces the function return address with the kretprobe_trampoline on the stack, stack unwinder shows it instead of the correct return address. This checks whether the next return address is the __kretprobe_trampoline(), and if so, try to find the correct return address from the kretprobe instance list. For this purpose this adds 'kr_cur' loop cursor to memorize the current kretprobe instance. With this fix, now arm64 can enable CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE, and pass the kprobe self tests. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-10-22 12:16:53 -04:00
Mark Rutland	0c32706dac	arm64: stacktrace: avoid tracing arch_stack_walk() When the function_graph tracer is in use, arch_stack_walk() may unwind the stack incorrectly, erroneously reporting itself, missing the final entry which is being traced, and reporting all traced entries between these off-by-one from where they should be. When ftrace hooks a function return, the original return address is saved to the fgraph ret_stack, and the return address in the LR (or the function's frame record) is replaced with `return_to_handler`. When arm64's unwinder encounter frames returning to `return_to_handler`, it finds the associated original return address from the fgraph ret stack, assuming the most recent `ret_to_hander` entry on the stack corresponds to the most recent entry in the fgraph ret stack, and so on. When arch_stack_walk() is used to dump the current task's stack, it starts from the caller of arch_stack_walk(). However, arch_stack_walk() can be traced, and so may push an entry on to the fgraph ret stack, leaving the fgraph ret stack offset by one from the expected position. This can be seen when dumping the stack via /proc/self/stack, where enabling the graph tracer results in an unexpected `stack_trace_save_tsk` entry at the start of the trace, and `el0_svc` missing form the end of the trace. This patch fixes this by marking arch_stack_walk() as notrace, as we do for all other functions on the path to ftrace_graph_get_ret_stack(). While a few helper functions are not marked notrace, their calls/returns are balanced, and will have no observable effect when examining the fgraph ret stack. It is possible for an exeption boundary to cause a similar offset if the return address of the interrupted context was in the LR. Fixing those cases will require some more substantial rework, and is left for subsequent patches. Before: \| # cat /proc/self/stack \| [<0>] proc_pid_stack+0xc4/0x140 \| [<0>] proc_single_show+0x6c/0x120 \| [<0>] seq_read_iter+0x240/0x4e0 \| [<0>] seq_read+0xe8/0x140 \| [<0>] vfs_read+0xb8/0x1e4 \| [<0>] ksys_read+0x74/0x100 \| [<0>] __arm64_sys_read+0x28/0x3c \| [<0>] invoke_syscall+0x50/0x120 \| [<0>] el0_svc_common.constprop.0+0xc4/0xd4 \| [<0>] do_el0_svc+0x30/0x9c \| [<0>] el0_svc+0x2c/0x54 \| [<0>] el0t_64_sync_handler+0x1a8/0x1b0 \| [<0>] el0t_64_sync+0x198/0x19c \| # echo function_graph > /sys/kernel/tracing/current_tracer \| # cat /proc/self/stack \| [<0>] stack_trace_save_tsk+0xa4/0x110 \| [<0>] proc_pid_stack+0xc4/0x140 \| [<0>] proc_single_show+0x6c/0x120 \| [<0>] seq_read_iter+0x240/0x4e0 \| [<0>] seq_read+0xe8/0x140 \| [<0>] vfs_read+0xb8/0x1e4 \| [<0>] ksys_read+0x74/0x100 \| [<0>] __arm64_sys_read+0x28/0x3c \| [<0>] invoke_syscall+0x50/0x120 \| [<0>] el0_svc_common.constprop.0+0xc4/0xd4 \| [<0>] do_el0_svc+0x30/0x9c \| [<0>] el0t_64_sync_handler+0x1a8/0x1b0 \| [<0>] el0t_64_sync+0x198/0x19c After: \| # cat /proc/self/stack \| [<0>] proc_pid_stack+0xc4/0x140 \| [<0>] proc_single_show+0x6c/0x120 \| [<0>] seq_read_iter+0x240/0x4e0 \| [<0>] seq_read+0xe8/0x140 \| [<0>] vfs_read+0xb8/0x1e4 \| [<0>] ksys_read+0x74/0x100 \| [<0>] __arm64_sys_read+0x28/0x3c \| [<0>] invoke_syscall+0x50/0x120 \| [<0>] el0_svc_common.constprop.0+0xc4/0xd4 \| [<0>] do_el0_svc+0x30/0x9c \| [<0>] el0_svc+0x2c/0x54 \| [<0>] el0t_64_sync_handler+0x1a8/0x1b0 \| [<0>] el0t_64_sync+0x198/0x19c \| # echo function_graph > /sys/kernel/tracing/current_tracer \| # cat /proc/self/stack \| [<0>] proc_pid_stack+0xc4/0x140 \| [<0>] proc_single_show+0x6c/0x120 \| [<0>] seq_read_iter+0x240/0x4e0 \| [<0>] seq_read+0xe8/0x140 \| [<0>] vfs_read+0xb8/0x1e4 \| [<0>] ksys_read+0x74/0x100 \| [<0>] __arm64_sys_read+0x28/0x3c \| [<0>] invoke_syscall+0x50/0x120 \| [<0>] el0_svc_common.constprop.0+0xc4/0xd4 \| [<0>] do_el0_svc+0x30/0x9c \| [<0>] el0_svc+0x2c/0x54 \| [<0>] el0t_64_sync_handler+0x1a8/0x1b0 \| [<0>] el0t_64_sync+0x198/0x19c Cc: <stable@vger.kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviwed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210802164845.45506-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2021-08-03 10:39:35 +01:00
Stephen Boyd	f61b870607	arm64: stacktrace: use %pSb for backtrace printing Let's use the new printk format to print the stacktrace entry when printing a backtrace to the kernel logs. This will include any module's build ID[1] in it so that offline/crash debugging can easily locate the debuginfo for a module via something like debuginfod[2]. Link: https://lkml.kernel.org/r/20210511003845.2429846-7-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sasha Levin <sashal@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-08 11:48:22 -07:00
Peter Collingbourne	33c222aeda	arm64: stacktrace: Relax frame record alignment requirement to 8 bytes The AAPCS places no requirements on the alignment of the frame record. In theory it could be placed anywhere, although it seems sensible to require it to be aligned to 8 bytes. With an upcoming enhancement to tag-based KASAN Clang will begin creating frame records located at an address that is only aligned to 8 bytes. Accommodate such frame records in the stack unwinding code. As pointed out by Mark Rutland, the userspace stack unwinding code has the same problem, so fix it there as well. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/Ia22c375230e67ca055e9e4bb639383567f7ad268 Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210526174927.2477847-2-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>	2021-05-26 20:01:17 +01:00
Peter Collingbourne	76734d26b5	arm64: Change the on_stack functions to take a size argument unwind_frame() was previously implicitly checking that the frame record is in bounds of the stack by enforcing that FP is both aligned to 16 and in bounds of the stack. Once the FP alignment requirement is relaxed to 8 this will not be sufficient because it does not account for the case where FP points to 8 bytes before the end of the stack. Make the check explicit by changing the on_stack functions to take a size argument and adjusting the callers to pass the appropriate sizes. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/Ib7a3eb3eea41b0687ffaba045ceb2012d077d8b4 Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210526174927.2477847-1-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>	2021-05-26 20:01:17 +01:00
Madhavan T. Venkataraman	7d7b720a4b	arm64: Implement stack trace termination record Reliable stacktracing requires that we identify when a stacktrace is terminated early. We can do this by ensuring all tasks have a final frame record at a known location on their task stack, and checking that this is the final frame record in the chain. We'd like to use task_pt_regs(task)->stackframe as the final frame record, as this is already setup upon exception entry from EL0. For kernel tasks we need to consistently reserve the pt_regs and point x29 at this, which we can do with small changes to __primary_switched, __secondary_switched, and copy_process(). Since the final frame record must be at a specific location, we must create the final frame record in __primary_switched and __secondary_switched rather than leaving this to start_kernel and secondary_start_kernel. Thus, __primary_switched and __secondary_switched will now show up in stacktraces for the idle tasks. Since the final frame record is now identified by its location rather than by its contents, we identify it at the start of unwind_frame(), before we read any values from it. External debuggers may terminate the stack trace when FP == 0. In the pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the debugger may print an extra record 0x0 at the end. While this is not pretty, this does not do any harm. This is a small price to pay for having reliable stack trace termination in the kernel. That said, gdb does not show the extra record probably because it uses DWARF and not frame pointers for stack traces. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> [Mark: rebase, use ASM_BUG(), update comments, update commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2021-05-25 18:53:29 +01:00
Linus Torvalds	51595e3b49	Assorted arm64 fixes and clean-ups, the most important: - Restore terminal stack frame records. Their previous removal caused traces which cross secondary_start_kernel to terminate one entry too late, with a spurious "0" entry. - Fix boot warning with pseudo-NMI due to the way we manipulate the PMR register. - ACPI fixes: avoid corruption of interrupt mappings on watchdog probe failure (GTDT), prevent unregistering of GIC SGIs. - Force SPARSEMEM_VMEMMAP as the only memory model, it saves with having to test all the other combinations. - Documentation fixes and updates: tagged address ABI exceptions on brk/mmap/mremap(), event stream frequency, update booting requirements on the configuration of traps. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmCVba0ACgkQa9axLQDI XvEClxAAsqigp+Mnotdr8YUOuXLjHWU41EMShV6WbFcmlViEyZxxtZ5qavw19T3L rPxb8hq9QqI8kCd+j4MAU7cdc0ry+047njJmQ3Va0WeiDsbgEfPvLWPguDbeDFXW EjKKib+F/u58IffDkn6rVA7ZVPgYHRH+8yw6EdApp0BN4JuxEFzGBzG4EWKXnNHH IOu4IIXlbLX+U1kTtUFR4u6i4uBs2pZdEYzo1NF/Joacg14F01CBRuh8U04eeWFD HF4pWd4eCl/bLYPurF1rOi1dIUyrPuaPgNInGEdSaocD0hIvQH0r0wyIt+aMmqvK 9Jm+dDEGeLxQn2nDrXfyldYG5EbFa3OplkUt2MVDDMWwN2Gpsjlnf/ucff/SBT/N 7D6AL2OH6KDDCsNgU1JH9H6rAlh4nWJcsMBrWmP7aQtBMRyccQLywrt4HXB8cy7E +MyhTit05P3lpsrK2uZSFujK35Ts8hxywA7lAlU7YP4ADKu3Noc6qXSaxZRe+1Gb O5k3Qdcih0VLE843PjJj8f8fW1ysJW5J60cK9BaZxpB77gNufKkh/hS6YAiA8qkt PT3J0jk/cgGvwKK54rW52dG7qvDImgUMGkXGKQnEimgb62DatCZ4ZOPC+UoiDiqO SEd1DSW0Lt1VxVIulAjatVgzIJGM0jGCm9L7/vBguR0+Lahakbg= =vYok -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "A mix of fixes and clean-ups that turned up too late for the first pull request: - Restore terminal stack frame records. Their previous removal caused traces which cross secondary_start_kernel to terminate one entry too late, with a spurious "0" entry. - Fix boot warning with pseudo-NMI due to the way we manipulate the PMR register. - ACPI fixes: avoid corruption of interrupt mappings on watchdog probe failure (GTDT), prevent unregistering of GIC SGIs. - Force SPARSEMEM_VMEMMAP as the only memory model, it saves with having to test all the other combinations. - Documentation fixes and updates: tagged address ABI exceptions on brk/mmap/mremap(), event stream frequency, update booting requirements on the configuration of traps" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kernel: Update the stale comment arm64: Fix the documented event stream frequency arm64: entry: always set GIC_PRIO_PSR_I_SET during entry arm64: Explicitly document boot requirements for SVE arm64: Explicitly require that FPSIMD instructions do not trap arm64: Relax booting requirements for configuration of traps arm64: cpufeatures: use min and max arm64: stacktrace: restore terminal records arm64/vdso: Discard .note.gnu.property sections in vDSO arm64: doc: Add brk/mmap/mremap() to the Tagged Address ABI Exceptions psci: Remove unneeded semicolon ACPI: irq: Prevent unregistering of GIC SGIs ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure arm64: Show three registers per line arm64: remove HAVE_DEBUG_BUGVERBOSE arm64: alternative: simplify passing alt_region arm64: Force SPARSEMEM_VMEMMAP as the only memory management model arm64: vdso32: drop -no-integrated-as flag	2021-05-07 12:11:05 -07:00
Mark Rutland	8533d5bfad	arm64: stacktrace: restore terminal records We removed the terminal frame records in commit: `6106e1112c` ("arm64: remove EL0 exception frame record") ... on the assumption that as we no longer used them to find the pt_regs at exception boundaries, they were no longer necessary. However, Leo reports that as an unintended side-effect, this causes traces which cross secondary_start_kernel to terminate one entry too late, with a spurious "0" entry. There are a few ways we could sovle this, but as we're planning to use terminal records for RELIABLE_STACKTRACE, let's revert the logic change for now, keeping the update comments and accounting for the changes in commit: `3c02600144` ("arm64: stacktrace: Report when we reach the end of the stack") This is effectively a partial revert of commit: `6106e1112c` ("arm64: remove EL0 exception frame record") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: `6106e1112c` ("arm64: remove EL0 exception frame record") Reported-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> Link: https://lore.kernel.org/r/20210429104813.GA33550@C02TD0UTHF1T.local Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-04-30 18:30:45 +01:00
Linus Torvalds	31a24ae89c	arm64 updates for 5.13: - MTE asynchronous support for KASan. Previously only synchronous (slower) mode was supported. Asynchronous is faster but does not allow precise identification of the illegal access. - Run kernel mode SIMD with softirqs disabled. This allows using NEON in softirq context for crypto performance improvements. The conditional yield support is modified to take softirqs into account and reduce the latency. - Preparatory patches for Apple M1: handle CPUs that only have the VHE mode available (host kernel running at EL2), add FIQ support. - arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new functions for the HiSilicon HHA and L3C PMU, cleanups. - Re-introduce support for execute-only user permissions but only when the EPAN (Enhanced Privileged Access Never) architecture feature is available. - Disable fine-grained traps at boot and improve the documented boot requirements. - Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC). - Add hierarchical eXecute Never permissions for all page tables. - Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs to control which PAC keys are enabled in a particular task. - arm64 kselftests for BTI and some improvements to the MTE tests. - Minor improvements to the compat vdso and sigpage. - Miscellaneous cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmB5xkkACgkQa9axLQDI XvEBgRAAsr6r8gsBQJP3FDHmbtbVf2ej5QJTCOAQAGHbTt0JH7Pk03pWSBr7h5nF vsddRDxxeDgB6xd7jWP7EvDaPxHeB0CdSj5gG8EP/ZdOm8sFAwB1ZIHWikgUgSwW nu6R28yXTMSj+EkyFtahMhTMJ1EMF4sCPuIgAo59ST5w/UMMqLCJByOu4ej6RPKZ aeSJJWaDLBmbgnTKWxRvCc/MgIx4J/LAHWGkdpGjuMK6SLp38Kdf86XcrklXtzwf K30ZYeoKq8zZ+nFOsK9gBVlOlocZcbS1jEbN842jD6imb6vKLQtBWrKk9A6o4v5E XulORWcSBhkZb3ItIU9+6SmelUExf0VeVlSp657QXYPgquoIIGvFl6rCwhrdGMGO bi6NZKCfJvcFZJoIN1oyhuHejgZSBnzGEcvhvzNdg7ItvOCed7q3uXcGHz/OI6tL 2TZKddzHSEMVfTo0D+RUsYfasZHI1qAiQ0mWVC31c+YHuRuW/K/jlc3a5TXlSBUa Dwu0/zzMLiqx65ISx9i7XNMrngk55uzrS6MnwSByPoz4M4xsElZxt3cbUxQ8YAQz jhxTHs1Pwes8i7f4n61ay/nHCFbmVvN/LlsPRpZdwd8JumThLrDolF3tc6aaY0xO hOssKtnGY4Xvh/WitfJ5uvDb1vMObJKTXQEoZEJh4hlNQDxdeUE= =6NGI -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - MTE asynchronous support for KASan. Previously only synchronous (slower) mode was supported. Asynchronous is faster but does not allow precise identification of the illegal access. - Run kernel mode SIMD with softirqs disabled. This allows using NEON in softirq context for crypto performance improvements. The conditional yield support is modified to take softirqs into account and reduce the latency. - Preparatory patches for Apple M1: handle CPUs that only have the VHE mode available (host kernel running at EL2), add FIQ support. - arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new functions for the HiSilicon HHA and L3C PMU, cleanups. - Re-introduce support for execute-only user permissions but only when the EPAN (Enhanced Privileged Access Never) architecture feature is available. - Disable fine-grained traps at boot and improve the documented boot requirements. - Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC). - Add hierarchical eXecute Never permissions for all page tables. - Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs to control which PAC keys are enabled in a particular task. - arm64 kselftests for BTI and some improvements to the MTE tests. - Minor improvements to the compat vdso and sigpage. - Miscellaneous cleanups. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits) arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG. arm64: pac: Optimize kernel entry/exit key installation code paths arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere arm64/sve: Remove redundant system_supports_sve() tests arm64: fpsimd: run kernel mode NEON with softirqs disabled arm64: assembler: introduce wxN aliases for wN registers arm64: assembler: remove conditional NEON yield macros kasan, arm64: tests supports for HW_TAGS async mode arm64: mte: Report async tag faults before suspend arm64: mte: Enable async tag check fault arm64: mte: Conditionally compile mte_enable_kernel_*() arm64: mte: Enable TCO in functions that can read beyond buffer limits kasan: Add report for async mode arm64: mte: Drop arch_enable_tagging() kasan: Add KASAN mode kernel parameter arm64: mte: Add asynchronous mode support arm64: Get rid of CONFIG_ARM64_VHE arm64: Cope with CPUs stuck in VHE mode ...	2021-04-26 10:25:03 -07:00
Mark Brown	b07f349966	arm64: stacktrace: Move start_backtrace() out of the header Currently start_backtrace() is a static inline function in the header. Since it really shouldn't be sufficiently performance critical that we actually need to have it inlined move it into a C file, this will save anyone else scratching their head about why it is defined in the header. As far as I can see it's only there because it was factored out of the various callers. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210319174022.33051-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2021-03-28 18:09:47 +01:00
Mark Rutland	c607ab4f91	arm64: stacktrace: don't trace arch_stack_walk() We recently converted arm64 to use arch_stack_walk() in commit: `5fc57df2f6` ("arm64: stacktrace: Convert to ARCH_STACKWALK") The core stacktrace code expects that (when tracing the current task) arch_stack_walk() starts a trace at its caller, and does not include itself in the trace. However, arm64's arch_stack_walk() includes itself, and so traces include one more entry than callers expect. The core stacktrace code which calls arch_stack_walk() tries to skip a number of entries to prevent itself appearing in a trace, and the additional entry prevents skipping one of the core stacktrace functions, leaving this in the trace unexpectedly. We can fix this by having arm64's arch_stack_walk() begin the trace with its caller. The first value returned by the trace will be __builtin_return_address(0), i.e. the caller of arch_stack_walk(). The first frame record to be unwound will be __builtin_frame_address(1), i.e. the caller's frame record. To prevent surprises, arch_stack_walk() is also marked noinline. While __builtin_frame_address(1) is not safe in portable code, local GCC developers have confirmed that it is safe on arm64. To find the caller's frame record, the builtin can safely dereference the current function's frame record or (in theory) could stash the original FP into another GPR at function entry time, neither of which are problematic. Prior to this patch, the tracing code would unexpectedly show up in traces of the current task, e.g. \| # cat /proc/self/stack \| [<0>] stack_trace_save_tsk+0x98/0x100 \| [<0>] proc_pid_stack+0xb4/0x130 \| [<0>] proc_single_show+0x60/0x110 \| [<0>] seq_read_iter+0x230/0x4d0 \| [<0>] seq_read+0xdc/0x130 \| [<0>] vfs_read+0xac/0x1e0 \| [<0>] ksys_read+0x6c/0xfc \| [<0>] __arm64_sys_read+0x20/0x30 \| [<0>] el0_svc_common.constprop.0+0x60/0x120 \| [<0>] do_el0_svc+0x24/0x90 \| [<0>] el0_svc+0x2c/0x54 \| [<0>] el0_sync_handler+0x1a4/0x1b0 \| [<0>] el0_sync+0x170/0x180 After this patch, the tracing code will not show up in such traces: \| # cat /proc/self/stack \| [<0>] proc_pid_stack+0xb4/0x130 \| [<0>] proc_single_show+0x60/0x110 \| [<0>] seq_read_iter+0x230/0x4d0 \| [<0>] seq_read+0xdc/0x130 \| [<0>] vfs_read+0xac/0x1e0 \| [<0>] ksys_read+0x6c/0xfc \| [<0>] __arm64_sys_read+0x20/0x30 \| [<0>] el0_svc_common.constprop.0+0x60/0x120 \| [<0>] do_el0_svc+0x24/0x90 \| [<0>] el0_svc+0x2c/0x54 \| [<0>] el0_sync_handler+0x1a4/0x1b0 \| [<0>] el0_sync+0x170/0x180 Erring on the side of caution, I've given this a spin with a bunch of toolchains, verifying the output of /proc/self/stack and checking that the assembly looked sound. For GCC (where we require version 5.1.0 or later) I tested with the kernel.org crosstool binares for versions 5.5.0, 6.4.0, 6.5.0, 7.3.0, 7.5.0, 8.1.0, 8.3.0, 8.4.0, 9.2.0, and 10.1.0. For clang (where we require version 10.0.1 or later) I tested with the llvm.org binary releases of 11.0.0, and 11.0.1. Fixes: `5fc57df2f6` ("arm64: stacktrace: Convert to ARCH_STACKWALK") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jun <chenjun102@huawei.com> Cc: Marco Elver <elver@google.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> # 5.10.x Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210319184106.5688-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>	2021-03-22 12:42:49 +00:00

1 2

96 commits