printk changes for 5.19

-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmKLXH8ACgkQUqAMR0iA lPIABhAAtAZRmvg9UjUS8dpmS3plXdg/zJU0AbK9o/m/hGzMfs2bgHxwM7mbGa1O VC0Jczj9tfJXESfrBsV0ZpY5H+iGilEkTF86/ME4sS8lmIeSim9dAxF4sTvM1vw/ IST4llN0IRuNHwrb20GyH44MOG9JwFwEyIgYITwkB8iYK/lo/sP8xkZuC44CmaJf 28ZZAwICigtyR9lF0psQGLgMc4+laT5l3XF/c9OyqEFbB5khBGxT0RwV0WS4ZcPA mTn5kW6WcDbTNKUVUHW1jzmJBq3ci+0ckh6jLNJWc6Olh5jbGU7selVTst96GQKm sgWF7uykURls3ZFPzTJSY6E3Gnwrsw75RQYDLtTOSxqB2NlVsBTyZq4jgNtxiR3z ovA9souDe4t/BPqkHTHZkVEyaFWZlRwNlzJZIwN2Auy/uFjznWnOQxT2t3BYUZt5 8qnUt+JBvtSNyLDvoNtQnyCiCyEZdyrHQ+3RsFWIQz6CnA34Xh6oZPxbK24pnfDy F5OuIulrpIPfEFufV6ZR30QeB2gLkvCorUfl5pde4QL/Pujxrk6CCikv39QOfL7K 6+X7hq/Moq8vhzMfWl+LEPS6qpAwNJl69JIaQrp18JHVGeKVagS1e6pOmThSOPv7 bDucE08oOK8KTnR6ysfKf24JC6HopB7vFYfhSEa8rgssDLtcGso= =pN3o -----END PGP SIGNATURE----- Merge tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Offload writing printk() messages on consoles to per-console kthreads. It prevents soft-lockups when an extensive amount of messages is printed. It was observed, for example, during boot of large systems with a lot of peripherals like disks or network interfaces. It prevents live-lockups that were observed, for example, when messages about allocation failures were reported and a CPU handled consoles instead of reclaiming the memory. It was hard to solve even with rate limiting because it would need to take into account the amount of messages and the speed of all consoles. It is a must to have for real time. Otherwise, any printk() might break latency guarantees. The per-console kthreads allow to handle each console on its own speed. Slow consoles do not longer slow down faster ones. And printk() does not longer unpredictably slows down various code paths. There are situations when the kthreads are either not available or not reliable, for example, early boot, suspend, or panic. In these situations, printk() uses the legacy mode and tries to handle consoles immediately. - Add documentation for the printk index. * tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk, tracing: fix console tracepoint printk: remove @console_locked printk: extend console_lock for per-console locking printk: add kthread console printers printk: add functions to prefer direct printing printk: add pr_flush() printk: move buffer definitions into console_emit_next_record() caller printk: refactor and rework printing logic printk: add con_printk() macro for console details printk: call boot_delay_msec() in printk_delay() printk: get caller_id/timestamp after migration disable printk: wake waiters for safe and NMI contexts printk: wake up all waiters printk: add missing memory barrier to wake_up_klogd() printk: cpu sync always disable interrupts printk: rename cpulock functions printk/index: Printk index feature documentation MAINTAINERS: Add printk indexing maintainers on mention of printk_index
2022-05-25 10:32:08 -07:00 · 2022-05-25 10:32:08 -07:00 · 537e62c865
parent 2e17ce1106 1c6fd59943
commit 537e62c865
15 changed files with 1194 additions and 341 deletions
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@ -20,6 +20,7 @@ it.
   workqueue
   printk-basics
   printk-formats
+   printk-index
   symbol-namespaces

 Data structures and low-level utilities
--- a/Documentation/core-api/printk-index.rst
+++ b/Documentation/core-api/printk-index.rst
@ -0,0 +1,137 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Printk Index
+============
+
+There are many ways how to monitor the state of the system. One important
+source of information is the system log. It provides a lot of information,
+including more or less important warnings and error messages.
+
+There are monitoring tools that filter and take action based on messages
+logged.
+
+The kernel messages are evolving together with the code. As a result,
+particular kernel messages are not KABI and never will be!
+
+It is a huge challenge for maintaining the system log monitors. It requires
+knowing what messages were updated in a particular kernel version and why.
+Finding these changes in the sources would require non-trivial parsers.
+Also it would require matching the sources with the binary kernel which
+is not always trivial. Various changes might be backported. Various kernel
+versions might be used on different monitored systems.
+
+This is where the printk index feature might become useful. It provides
+a dump of printk formats used all over the source code used for the kernel
+and modules on the running system. It is accessible at runtime via debugfs.
+
+The printk index helps to find changes in the message formats. Also it helps
+to track the strings back to the kernel sources and the related commit.
+
+
+User Interface
+==============
+
+The index of printk formats are split in into separate files. The files are
+named according to the binaries where the printk formats are built-in. There
+is always "vmlinux" and optionally also modules, for example::
+
+   /sys/kernel/debug/printk/index/vmlinux
+   /sys/kernel/debug/printk/index/ext4
+   /sys/kernel/debug/printk/index/scsi_mod
+
+Note that only loaded modules are shown. Also printk formats from a module
+might appear in "vmlinux" when the module is built-in.
+
+The content is inspired by the dynamic debug interface and looks like::
+
+   $> head -1 /sys/kernel/debug/printk/index/vmlinux; shuf -n 5 vmlinux
+   # <level[,flags]> filename:line function "format"
+   <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
+   <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
+   <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
+   <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
+   <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
+
+, where the meaning is:
+
+   - :level: log level value: 0-7 for particular severity, -1 as default,
+	'c' as continuous line without an explicit log level
+   - :flags: optional flags: currently only 'c' for KERN_CONT
+   - :filename\:line: source filename and line number of the related
+	printk() call. Note that there are many wrappers, for example,
+	pr_warn(), pr_warn_once(), dev_warn().
+   - :function: function name where the printk() call is used.
+   - :format: format string
+
+The extra information makes it a bit harder to find differences
+between various kernels. Especially the line number might change
+very often. On the other hand, it helps a lot to confirm that
+it is the same string or find the commit that is responsible
+for eventual changes.
+
+
+printk() Is Not a Stable KABI
+=============================
+
+Several developers are afraid that exporting all these implementation
+details into the user space will transform particular printk() calls
+into KABI.
+
+But it is exactly the opposite. printk() calls must _not_ be KABI.
+And the printk index helps user space tools to deal with this.
+
+
+Subsystem specific printk wrappers
+==================================
+
+The printk index is generated using extra metadata that are stored in
+a dedicated .elf section ".printk_index". It is achieved using macro
+wrappers doing __printk_index_emit() together with the real printk()
+call. The same technique is used also for the metadata used by
+the dynamic debug feature.
+
+The metadata are stored for a particular message only when it is printed
+using these special wrappers. It is implemented for the commonly
+used printk() calls, including, for example, pr_warn(), or pr_once().
+
+Additional changes are necessary for various subsystem specific wrappers
+that call the original printk() via a common helper function. These needs
+their own wrappers adding __printk_index_emit().
+
+Only few subsystem specific wrappers have been updated so far,
+for example, dev_printk(). As a result, the printk formats from
+some subsystes can be missing in the printk index.
+
+
+Subsystem specific prefix
+=========================
+
+The macro pr_fmt() macro allows to define a prefix that is printed
+before the string generated by the related printk() calls.
+
+Subsystem specific wrappers usually add even more complicated
+prefixes.
+
+These prefixes can be stored into the printk index metadata
+by an optional parameter of __printk_index_emit(). The debugfs
+interface might then show the printk formats including these prefixes.
+For example, drivers/acpi/osl.c contains::
+
+  #define pr_fmt(fmt) "ACPI: OSL: " fmt
+
+  static int __init acpi_no_auto_serialize_setup(char *str)
+  {
+	acpi_gbl_auto_serialize_methods = FALSE;
+	pr_info("Auto-serialization disabled\n");
+
+	return 1;
+  }
+
+This results in the following printk index entry::
+
+  <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
+
+It helps matching messages from the real log with printk index.
+Then the source file name, line number, and function name can
+be used to match the string with the source code.
--- a/2
+++ b/2
@ -15923,7 +15923,9 @@ F:	kernel/printk/
 PRINTK INDEXING
 R:	Chris Down <chris@chrisdown.name>
 S:	Maintained
+F:	Documentation/core-api/printk-index.rst
 F:	kernel/printk/index.c
+K:	printk_index

 PROC FILESYSTEM
 L:	linux-kernel@vger.kernel.org
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@ -578,6 +578,7 @@ void __handle_sysrq(int key, bool check_mask)

 	rcu_sysrq_start();
 	rcu_read_lock();
+	printk_prefer_direct_enter();
 	/*
 	 * Raise the apparent loglevel to maximum so that the sysrq header
 	 * is shown to provide the user with positive feedback.  We do not
@ -619,6 +620,7 @@ void __handle_sysrq(int key, bool check_mask)
 		pr_cont("\n");
 		console_loglevel = orig_log_level;
 	}
+	printk_prefer_direct_exit();
 	rcu_read_unlock();
 	rcu_sysrq_end();

--- a/include/linux/console.h
+++ b/include/linux/console.h
@ -16,6 +16,7 @@

 #include <linux/atomic.h>
 #include <linux/types.h>
+#include <linux/mutex.h>

 struct vc_data;
 struct console_font_op;
@ -151,6 +152,24 @@ struct console {
 	int	cflag;
 	uint	ispeed;
 	uint	ospeed;
+	u64	seq;
+	unsigned long dropped;
+	struct task_struct *thread;
+	bool	blocked;
+
+	/*
+	 * The per-console lock is used by printing kthreads to synchronize
+	 * this console with callers of console_lock(). This is necessary in
+	 * order to allow printing kthreads to run in parallel to each other,
+	 * while each safely accessing the @blocked field and synchronizing
+	 * against direct printing via console_lock/console_unlock.
+	 *
+	 * Note: For synchronizing against direct printing via
+	 *       console_trylock/console_unlock, see the static global
+	 *       variable @console_kthreads_active.
+	 */
+	struct mutex lock;
+
 	void	*data;
 	struct	 console *next;
 };
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@ -170,6 +170,11 @@ extern void __printk_safe_exit(void);
 #define printk_deferred_enter __printk_safe_enter
 #define printk_deferred_exit __printk_safe_exit

+extern void printk_prefer_direct_enter(void);
+extern void printk_prefer_direct_exit(void);
+
+extern bool pr_flush(int timeout_ms, bool reset_on_progress);
+
 /*
 * Please don't use printk_ratelimit(), because it shares ratelimiting state
 * with all other unrelated printk_ratelimit() callsites.  Instead use
@ -220,6 +225,19 @@ static inline void printk_deferred_exit(void)
 {
 }

+static inline void printk_prefer_direct_enter(void)
+{
+}
+
+static inline void printk_prefer_direct_exit(void)
+{
+}
+
+static inline bool pr_flush(int timeout_ms, bool reset_on_progress)
+{
+	return true;
+}
+
 static inline int printk_ratelimit(void)
 {
 	return 0;
@ -277,46 +295,58 @@ static inline void printk_trigger_flush(void)
 #endif

 #ifdef CONFIG_SMP
-extern int __printk_cpu_trylock(void);
-extern void __printk_wait_on_cpu_lock(void);
-extern void __printk_cpu_unlock(void);
-
-/**
- * printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning
- *                             lock and disable interrupts.
- * @flags: Stack-allocated storage for saving local interrupt state,
- *         to be passed to printk_cpu_unlock_irqrestore().
- *
- * If the lock is owned by another CPU, spin until it becomes available.
- * Interrupts are restored while spinning.
- */
-#define printk_cpu_lock_irqsave(flags)		\
-	for (;;) {				\
-		local_irq_save(flags);		\
-		if (__printk_cpu_trylock())	\
-			break;			\
-		local_irq_restore(flags);	\
-		__printk_wait_on_cpu_lock();	\
-	}
-
-/**
- * printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant spinning
- *                                  lock and restore interrupts.
- * @flags: Caller's saved interrupt state, from printk_cpu_lock_irqsave().
- */
-#define printk_cpu_unlock_irqrestore(flags)	\
-	do {					\
-		__printk_cpu_unlock();		\
-		local_irq_restore(flags);	\
-	} while (0)				\
+extern int __printk_cpu_sync_try_get(void);
+extern void __printk_cpu_sync_wait(void);
+extern void __printk_cpu_sync_put(void);

 #else

-#define printk_cpu_lock_irqsave(flags) ((void)flags)
-#define printk_cpu_unlock_irqrestore(flags) ((void)flags)
-
+#define __printk_cpu_sync_try_get() true
+#define __printk_cpu_sync_wait()
+#define __printk_cpu_sync_put()
 #endif /* CONFIG_SMP */

+/**
+ * printk_cpu_sync_get_irqsave() - Disable interrupts and acquire the printk
+ *                                 cpu-reentrant spinning lock.
+ * @flags: Stack-allocated storage for saving local interrupt state,
+ *         to be passed to printk_cpu_sync_put_irqrestore().
+ *
+ * If the lock is owned by another CPU, spin until it becomes available.
+ * Interrupts are restored while spinning.
+ *
+ * CAUTION: This function must be used carefully. It does not behave like a
+ * typical lock. Here are important things to watch out for...
+ *
+ *     * This function is reentrant on the same CPU. Therefore the calling
+ *       code must not assume exclusive access to data if code accessing the
+ *       data can run reentrant or within NMI context on the same CPU.
+ *
+ *     * If there exists usage of this function from NMI context, it becomes
+ *       unsafe to perform any type of locking or spinning to wait for other
+ *       CPUs after calling this function from any context. This includes
+ *       using spinlocks or any other busy-waiting synchronization methods.
+ */
+#define printk_cpu_sync_get_irqsave(flags)		\
+	for (;;) {					\
+		local_irq_save(flags);			\
+		if (__printk_cpu_sync_try_get())	\
+			break;				\
+		local_irq_restore(flags);		\
+		__printk_cpu_sync_wait();		\
+	}
+
+/**
+ * printk_cpu_sync_put_irqrestore() - Release the printk cpu-reentrant spinning
+ *                                    lock and restore interrupts.
+ * @flags: Caller's saved interrupt state, from printk_cpu_sync_get_irqsave().
+ */
+#define printk_cpu_sync_put_irqrestore(flags)	\
+	do {					\
+		__printk_cpu_sync_put();	\
+		local_irq_restore(flags);	\
+	} while (0)
+
 extern int kptr_restrict;

 /**
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@ -127,6 +127,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
 	 * complain:
 	 */
 	if (sysctl_hung_task_warnings) {
+		printk_prefer_direct_enter();
+
 		if (sysctl_hung_task_warnings > 0)
 			sysctl_hung_task_warnings--;
 		pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
@ -142,6 +144,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)

 		if (sysctl_hung_task_all_cpu_backtrace)
 			hung_task_show_all_bt = true;
+
+		printk_prefer_direct_exit();
 	}

 	touch_nmi_watchdog();
@ -204,12 +208,17 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	}
 unlock:
 	rcu_read_unlock();
-	if (hung_task_show_lock)
+	if (hung_task_show_lock) {
+		printk_prefer_direct_enter();
 		debug_show_all_locks();
+		printk_prefer_direct_exit();
+	}

 	if (hung_task_show_all_bt) {
 		hung_task_show_all_bt = false;
+		printk_prefer_direct_enter();
 		trigger_all_cpu_backtrace();
+		printk_prefer_direct_exit();
 	}

 	if (hung_task_call_panic)
--- a/kernel/panic.c
+++ b/kernel/panic.c
@ -579,6 +579,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 {
 	disable_trace_on_warning();

+	printk_prefer_direct_enter();
+
 	if (file)
 		pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS\n",
 			raw_smp_processor_id(), current->pid, file, line,
@ -608,6 +610,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,

 	/* Just a warning, don't kill lockdep. */
 	add_taint(taint, LOCKDEP_STILL_OK);
+
+	printk_prefer_direct_exit();
 }

 #ifndef __WARN_FLAGS
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@ -647,6 +647,7 @@ static void print_cpu_stall(unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	printk_prefer_direct_enter();
 	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
 	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
 	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
@ -684,6 +685,7 @@ static void print_cpu_stall(unsigned long gps)
 	 */
 	set_tsk_need_resched(current);
 	set_preempt_need_resched();
+	printk_prefer_direct_exit();
 }

 static void check_cpu_stall(struct rcu_data *rdp)
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@ -447,9 +447,11 @@ static int __orderly_reboot(void)
 	ret = run_cmd(reboot_cmd);

 	if (ret) {
+		printk_prefer_direct_enter();
 		pr_warn("Failed to start orderly reboot: forcing the issue\n");
 		emergency_sync();
 		kernel_restart(NULL);
+		printk_prefer_direct_exit();
 	}

 	return ret;
@ -462,6 +464,7 @@ static int __orderly_poweroff(bool force)
 	ret = run_cmd(poweroff_cmd);

 	if (ret && force) {
+		printk_prefer_direct_enter();
 		pr_warn("Failed to start orderly shutdown: forcing the issue\n");

 		/*
@ -471,6 +474,7 @@ static int __orderly_poweroff(bool force)
 		 */
 		emergency_sync();
 		kernel_power_off();
+		printk_prefer_direct_exit();
 	}

 	return ret;
@ -528,6 +532,8 @@ EXPORT_SYMBOL_GPL(orderly_reboot);
 */
 static void hw_failure_emergency_poweroff_func(struct work_struct *work)
 {
+	printk_prefer_direct_enter();
+
 	/*
 	 * We have reached here after the emergency shutdown waiting period has
 	 * expired. This means orderly_poweroff has not been able to shut off
@ -544,6 +550,8 @@ static void hw_failure_emergency_poweroff_func(struct work_struct *work)
 	 */
 	pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n");
 	emergency_restart();
+
+	printk_prefer_direct_exit();
 }

 static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
@ -582,11 +590,13 @@ void hw_protection_shutdown(const char *reason, int ms_until_forced)
 {
 	static atomic_t allow_proceed = ATOMIC_INIT(1);

+	printk_prefer_direct_enter();
+
 	pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason);

 	/* Shutdown should be initiated only once. */
 	if (!atomic_dec_and_test(&allow_proceed))
-		return;
+		goto out;

 	/*
 	 * Queue a backup emergency shutdown in the event of
@ -594,6 +604,8 @@ void hw_protection_shutdown(const char *reason, int ms_until_forced)
 	 */
 	hw_failure_emergency_poweroff(ms_until_forced);
 	orderly_poweroff(true);
+out:
+	printk_prefer_direct_exit();
 }
 EXPORT_SYMBOL_GPL(hw_protection_shutdown);

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@ -424,6 +424,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		/* Start period for the next softlockup warning. */
 		update_report_ts();

+		printk_prefer_direct_enter();
+
 		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
@ -442,6 +444,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
 		if (softlockup_panic)
 			panic("softlockup: hung tasks");
+
+		printk_prefer_direct_exit();
 	}

 	return HRTIMER_RESTART;
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@ -135,6 +135,8 @@ static void watchdog_overflow_callback(struct perf_event *event,
 		if (__this_cpu_read(hard_watchdog_warn) == true)
 			return;

+		printk_prefer_direct_enter();
+
 		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
 			 this_cpu);
 		print_modules();
@ -155,6 +157,8 @@ static void watchdog_overflow_callback(struct perf_event *event,
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");

+		printk_prefer_direct_exit();
+
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;
 	}
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@ -102,9 +102,9 @@ asmlinkage __visible void dump_stack_lvl(const char *log_lvl)
 	 * Permit this cpu to perform nested stack dumps while serialising
 	 * against other CPUs
 	 */
-	printk_cpu_lock_irqsave(flags);
+	printk_cpu_sync_get_irqsave(flags);
 	__dump_stack(log_lvl);
-	printk_cpu_unlock_irqrestore(flags);
+	printk_cpu_sync_put_irqrestore(flags);
 }
 EXPORT_SYMBOL(dump_stack_lvl);

--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@ -99,7 +99,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 		 * Allow nested NMI backtraces while serializing
 		 * against other CPUs.
 		 */
-		printk_cpu_lock_irqsave(flags);
+		printk_cpu_sync_get_irqsave(flags);
 		if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
 			pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
 				cpu, (void *)instruction_pointer(regs));
@ -110,7 +110,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
 			else
 				dump_stack();
 		}
-		printk_cpu_unlock_irqrestore(flags);
+		printk_cpu_sync_put_irqrestore(flags);
 		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 		return true;
 	}