Commit graph

449 commits

Author SHA1 Message Date
Petr Mladek
21493c6e96 Merge branch 'rework/console-list-lock' into for-linus 2023-01-19 14:56:38 +01:00
Anuradha Weeraman
4fe59a130c kernel/printk/printk.c: Fix W=1 kernel-doc warning
Fix W=1 kernel-doc warning:

kernel/printk/printk.c:
 - Include function parameter in console_lock_spinning_disable_and_check()

Signed-off-by: Anuradha Weeraman <anuradha@debian.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230116125635.374567-1-anuradha@debian.org
2023-01-16 16:59:17 +01:00
John Ogness
3ef5abd9b5 tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()
Several mutexes are taken while setting up console serial ports. In
particular, the tty_port->mutex and @console_mutex are taken:

  serial_pnp_probe
    serial8250_register_8250_port
      uart_add_one_port (locks tty_port->mutex)
        uart_configure_port
          register_console (locks @console_mutex)

In order to synchronize kgdb's tty_find_polling_driver() with
register_console(), commit 6193bc9084 ("tty: serial: kgdboc:
synchronize tty_find_polling_driver() and register_console()") takes
the @console_mutex. However, this leads to the following call chain
(with locking):

  platform_probe
    kgdboc_probe
      configure_kgdboc (locks @console_mutex)
        tty_find_polling_driver
          uart_poll_init (locks tty_port->mutex)
            uart_set_options

This is clearly deadlock potential due to the reverse lock ordering.

Since uart_set_options() requires holding @console_mutex in order to
serialize early initialization of the serial-console lock, take the
@console_mutex in uart_poll_init() instead of configure_kgdboc().

Since configure_kgdboc() was using @console_mutex for safe traversal
of the console list, change it to use the SRCU iterator instead.

Add comments to uart_set_options() kerneldoc mentioning that it
requires holding @console_mutex (aka the console_list_lock).

Fixes: 6193bc9084 ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.com: Export console_srcu_read_lock_is_held() to fix build kgdboc as a module.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230112161213.1434854-1-john.ogness@linutronix.de
2023-01-16 16:44:53 +01:00
Petr Mladek
6b2b0d839a Merge branch 'rework/console-list-lock' into for-linus 2022-12-08 11:46:56 +01:00
John Ogness
5074ffbec6 printk: htmldocs: add missing description
Variable and return descriptions were missing from the SRCU read
lock functions. Add them.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/87zgcjpdvo.fsf@jogness.linutronix.de
2022-12-02 11:25:02 +01:00
John Ogness
848a9c1066 printk: relieve console_lock of list synchronization duties
The console_list_lock provides synchronization for console list and
console->flags updates. All call sites that were using the console_lock
for this synchronization have either switched to use the
console_list_lock or the SRCU list iterator.

Remove console_lock usage for console list updates and console->flags
updates.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-40-john.ogness@linutronix.de
2022-12-02 11:25:02 +01:00
John Ogness
6f8836756f printk, xen: fbfront: create/use safe function for forcing preferred
With commit 9e124fe16ff2("xen: Enable console tty by default in domU
if it's not a dummy") a hack was implemented to make sure that the
tty console remains the console behind the /dev/console device. The
main problem with the hack is that, after getting the console pointer
to the tty console, it is assumed the pointer is still valid after
releasing the console_sem. This assumption is incorrect and unsafe.

Make the hack safe by introducing a new function
console_force_preferred_locked() and perform the full operation
under the console_list_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-33-john.ogness@linutronix.de
2022-12-02 11:25:02 +01:00
John Ogness
1fd4224a6b console: introduce console_is_registered()
Currently it is not possible for drivers to detect if they have
already successfully registered their console. Several drivers
have multiple paths that lead to console registration. To avoid
attempting a 2nd registration (which leads to a WARN), drivers
are implementing their own solution.

Introduce console_is_registered() so drivers can easily identify
if their console is currently registered. A _locked() variant
is also provided if the caller is already holding the
console_list_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-22-john.ogness@linutronix.de
2022-12-02 11:25:01 +01:00
John Ogness
8cb15f7f49 printk: console_device: use srcu console list iterator
Use srcu console list iteration for console list traversal. It is
acceptable because the consoles might come and go at any time.
Strict synchronizing with console registration code would not bring
any advantage over srcu.

Document why the console_lock is still necessary. Note that this
is a preparatory change for when console_lock no longer provides
synchronization for the console list.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-21-john.ogness@linutronix.de
2022-12-02 11:25:01 +01:00
John Ogness
87f2e4b7d9 printk: console_flush_on_panic: use srcu console list iterator
With SRCU it is now safe to traverse the console list, even if
the console_trylock() failed. However, overwriting console->seq
when console_trylock() failed is still an issue.

Switch to SRCU iteration and document remaining issue with
console->seq.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-20-john.ogness@linutronix.de
2022-12-02 11:25:01 +01:00
John Ogness
d792db6f6b printk: console_unblank: use srcu console list iterator
Use srcu console list iteration for console list traversal.

Document why the console_lock is still necessary. Note that this
is a preparatory change for when console_lock no longer provides
synchronization for the console list.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-19-john.ogness@linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
12f1da5fc4 printk: console_is_usable: use console_srcu_read_flags
All users of console_is_usable() are SRCU iterators. Use the
appropriate wrapper function to locklessly read the flags.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-18-john.ogness@linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
eb7f1ed250 printk: __pr_flush: use srcu console list iterator
Use srcu console list iteration for console list traversal.

Document why the console_lock is still necessary. Note that this
is a preparatory change for when console_lock no longer provides
synchronization for the console list.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-17-john.ogness@linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
fc956ae0de printk: console_flush_all: use srcu console list iterator
Guarantee safe iteration of the console list by using SRCU.

Note that in the case of a handover, the SRCU read lock is also
released. This is documented in the function description and as
comments in the code. It is a bit tricky, but this preserves the
lockdep lock ordering for the context handing over the
console_lock:

  console_lock()
  | mutex_acquire(&console_lock_dep_map)       <-- console lock
  |
  console_unlock()
  | console_flush_all()
  | | srcu_read_lock(&console_srcu)            <-- srcu lock
  | | console_emit_next_record()
  | | | console_lock_spinning_disable_and_check()
  | | | | srcu_read_unlock(&console_srcu)      <-- srcu unlock
  | | | | mutex_release(&console_lock_dep_map) <-- console unlock

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-16-john.ogness@linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
100bdef2c1 console: introduce wrappers to read/write console flags
After switching to SRCU for console list iteration, some readers
will begin readings console->flags as a data race. Locklessly
reading console->flags provides a consistent value because there
is at most one CPU modifying console->flags and that CPU is
using only read-modify-write operations.

Introduce a wrapper for SRCU iterators to read console flags.
Introduce a matching wrapper to write to flags of registered
consoles. Writing to flags of registered consoles is synchronized
by the console_list_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-13-john.ogness@linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
4dc64682ad printk: introduce console_list_lock
Currently there exist races in register_console(), where the types
of registered consoles are checked (without holding the console_lock)
and then after acquiring the console_lock, it is assumed that the list
has not changed. Also, some code that performs console_unregister()
make similar assumptions.

It might be possible to fix these races using the console_lock. But
it would require a complex analysis of all console drivers to make
sure that the console_lock is not taken in match() and setup()
callbacks. And we really prefer to split up and reduce the
responsibilities of console_lock rather than expand its complexity.
Therefore, introduce a new console_list_lock to provide full
synchronization for any console list changes.

In addition, also use console_list_lock for synchronization of
console->flags updates. All flags are either static or modified only
during the console registration. There are only two exceptions.

The first exception is CON_ENABLED, which is also modified by
console_start()/console_stop(). Therefore, these functions must
also take the console_list_lock.

The second exception is when the flags are modified by the console
driver init code before the console is registered. These will be
ignored because they are not visible to the rest of the system
via the console_drivers list.

Note that one of the various responsibilities of the console_lock is
also intended to provide console list and console->flags
synchronization. Later changes will update call sites relying on the
console_lock for these purposes. Once all call sites have been
updated, the console_lock will be relieved of synchronizing
console_list and console->flags updates.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/87sficwokr.fsf@jogness.linutronix.de
2022-12-02 11:25:00 +01:00
John Ogness
a424276093 printk: fix setting first seq for consoles
It used to be that all consoles were synchronized with respect to
which message they were printing. After commit a699449bb1 ("printk:
refactor and rework printing logic"), all consoles have their own
@seq for tracking which message they are on. That commit also changed
how the initial sequence number was chosen. Instead of choosing the
next non-printed message, it chose the sequence number of the next
message that will be added to the ringbuffer.

That change created a possibility that a non-boot console taking over
for a boot console might skip messages if the boot console was behind
and did not have a chance to catch up before being unregistered.

Since it is not known which boot console is the same device, flush
all consoles and, if necessary, start with the message of the enabled
boot console that is the furthest behind. If no boot consoles are
enabled, begin with the next message that will be added to the
ringbuffer.

Also, since boot consoles are meant to be used at boot time, handle
them the same as CON_PRINTBUFFER to ensure that no initial messages
are skipped.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-7-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
John Ogness
b80ea0e81b printk: move @seq initialization to helper
The code to initialize @seq for a new console needs to consider
more factors when choosing an initial value. Move the code into
a helper function console_init_seq() "as is" so this code can
be expanded without causing register_console() to become too
long. A later commit will implement the additional code.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-6-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
John Ogness
1145703612 printk: register_console: use "registered" for variable names
The @bootcon_enabled and @realcon_enabled local variables actually
represent if such console types are registered. In general there
has been a confusion about enabled vs. registered. Incorrectly
naming such variables promotes such confusion.

Rename the variables to _registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-5-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
John Ogness
6c4afa7914 printk: Prepare for SRCU console list protection
Provide an NMI-safe SRCU protected variant to walk the console list.

Note that all console fields are now set before adding the console
to the list to avoid the console becoming visible by SCRU readers
before being fully initialized.

This is a preparatory change for a new console infrastructure which
operates independent of the console BKL.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-4-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
Thomas Gleixner
d9a4af5690 printk: Convert console_drivers list to hlist
Replace the open coded single linked list with a hlist so a conversion
to SRCU protected list walks can reuse the existing primitives.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-3-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
Xu Panda
7365df19e8 printk: use strscpy() to instead of strlcpy()
The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL terminated strings.

Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/202211301601416229001@zte.com.cn
2022-12-01 11:57:51 +01:00
Wang Honghui
7b0592a23e printk: fix a typo of comment
Fix a typo of comment

Signed-off-by: Wang Honghui <honghui.wang@ucas.com.cn>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/77522C532189E547+Y3yG91g6XALbtdJr@TP-P15V.lan
Link: https://lore.kernel.org/r/0C7C980DB815FAE1+Y3yNXJCqZ3Nzxa5V@TP-P15V.lan
2022-11-22 12:10:15 +01:00
Thomas Gleixner
78ba392c84 printk: Mark __printk percpu data ready __ro_after_init
This variable cannot change post boot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220924000454.3319186-6-john.ogness@linutronix.de
2022-09-29 15:20:52 +02:00
Thomas Gleixner
eb4531b346 printk: Remove bogus comment vs. boot consoles
The comment about unregistering boot consoles is just not matching the
reality. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220924000454.3319186-5-john.ogness@linutronix.de
2022-09-29 15:20:44 +02:00
Thomas Gleixner
7fc11a521e printk: Remove write only variable nr_ext_console_drivers
Commit a699449bb1 ("printk: refactor and rework printing logic")
removed the need for @nr_ext_console_drivers. Remove the unneeded
variable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220924000454.3319186-4-john.ogness@linutronix.de
2022-09-29 15:20:36 +02:00
Thomas Gleixner
c60ba2d346 printk: Make pr_flush() static
No user outside the printk code and no reason to export this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220924000454.3319186-2-john.ogness@linutronix.de
2022-09-29 15:20:22 +02:00
John Ogness
9023ca0866 printk: do not wait for consoles when suspended
The console_stop() and console_start() functions call pr_flush().
When suspending, these functions are called by the serial subsystem
while the serial port is suspended. In this scenario, if there are
any pending messages, a call to pr_flush() will always result in a
timeout because the serial port cannot make forward progress. This
causes longer suspend and resume times.

Add a check in pr_flush() so that it will immediately timeout if
the consoles are suspended.

Fixes: 3b604ca812 ("printk: add pr_flush()")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220715061042.373640-2-john.ogness@linutronix.de
2022-07-15 10:52:11 +02:00
Petr Mladek
07a22b6194 Revert "printk: add functions to prefer direct printing"
This reverts commit 2bb2b7b57f.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-7-pmladek@suse.com
2022-06-23 18:41:40 +02:00
Petr Mladek
5831788afb Revert "printk: add kthread console printers"
This reverts commit 09c5ba0aa2.

This reverts commit b87f02307d.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-6-pmladek@suse.com
2022-06-23 18:41:40 +02:00
Petr Mladek
2d9ef940f8 Revert "printk: extend console_lock for per-console locking"
This reverts commit 8e27473211.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-5-pmladek@suse.com
2022-06-23 18:41:40 +02:00
Petr Mladek
007eeab7e9 Revert "printk: remove @console_locked"
This reverts commit ab406816fc.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-4-pmladek@suse.com
2022-06-23 18:41:40 +02:00
Petr Mladek
05c96b3713 Revert "printk: Block console kthreads when direct printing will be required"
This reverts commit c3230283e2.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-3-pmladek@suse.com
2022-06-23 18:41:40 +02:00
Petr Mladek
20fb0c8272 Revert "printk: Wait for the global console lock when the system is going down"
This reverts commit b87f02307d.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-2-pmladek@suse.com
2022-06-23 18:41:40 +02:00
Petr Mladek
b87f02307d printk: Wait for the global console lock when the system is going down
There are reports that the console kthreads block the global console
lock when the system is going down, for example, reboot, panic.

First part of the solution was to block kthreads in these problematic
system states so they stopped handling newly added messages.

Second part of the solution is to wait when for the kthreads when
they are actively printing. It solves the problem when a message
was printed before the system entered the problematic state and
the kthreads managed to step in.

A busy waiting has to be used because panic() can be called in any
context and in an unknown state of the scheduler.

There must be a timeout because the kthread might get stuck or sleeping
and never release the lock. The timeout 10s is an arbitrary value
inspired by the softlockup timeout.

Link: https://lore.kernel.org/r/20220610205038.GA3050413@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/r/CAMdYzYpF4FNTBPZsEFeWRuEwSies36QM_As8osPWZSr2q-viEA@mail.gmail.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220615162805.27962-3-pmladek@suse.com
2022-06-15 22:04:15 +02:00
Petr Mladek
c3230283e2 printk: Block console kthreads when direct printing will be required
There are known situations when the console kthreads are not
reliable or does not work in principle, for example, early boot,
panic, shutdown.

For these situations there is the direct (legacy) mode when printk() tries
to get console_lock() and flush the messages directly. It works very well
during the early boot when the console kthreads are not available at all.
It gets more complicated in the other situations when console kthreads
might be actively printing and block console_trylock() in printk().

The same problem is in the legacy code as well. Any console_lock()
owner could block console_trylock() in printk(). It is solved by
a trick that the current console_lock() owner is responsible for
printing all pending messages. It is actually the reason why there
is the risk of softlockups and why the console kthreads were
introduced.

The console kthreads use the same approach. They are responsible
for printing the messages by definition. So that they handle
the messages anytime when they are awake and see new ones.
The global console_lock is available when there is nothing
to do.

It should work well when the problematic context is correctly
detected and printk() switches to the direct mode. But it seems
that it is not enough in practice. There are reports that
the messages are not printed during panic() or shutdown()
even though printk() tries to use the direct mode here.

The problem seems to be that console kthreads become active in these
situation as well. They steel the job before other CPUs are stopped.
Then they are stopped in the middle of the job and block the global
console_lock.

First part of the solution is to block console kthreads when
the system is in a problematic state and requires the direct
printk() mode.

Link: https://lore.kernel.org/r/20220610205038.GA3050413@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/r/CAMdYzYpF4FNTBPZsEFeWRuEwSies36QM_As8osPWZSr2q-viEA@mail.gmail.com
Suggested-by: John Ogness <john.ogness@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220615162805.27962-2-pmladek@suse.com
2022-06-15 22:03:38 +02:00
John Ogness
809631e2bf Revert "printk: wake up all waiters"
This reverts commit 938ba4084a.

The wait queue @log_wait never has exclusive waiters, so there
is no need to use wake_up_interruptible_all(). Using
wake_up_interruptible() was the correct function to wake all
waiters.

Since there are no exclusive waiters, erroneously changing
wake_up_interruptible() to wake_up_interruptible_all() did not
result in any behavior change. However, using
wake_up_interruptible_all() on a wait queue without exclusive
waiters is fundamentally wrong.

Go back to using wake_up_interruptible() to wake all waiters.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220526203056.81123-1-john.ogness@linutronix.de
2022-05-27 13:04:46 +02:00
Marco Elver
701850dc0c printk, tracing: fix console tracepoint
The original intent of the 'console' tracepoint per the commit 9510035849
("printk/tracing: Add console output tracing") had been to "[...] record
any printk messages into the trace, regardless of the current console
loglevel. This can help correlate (existing) printk debugging with other
tracing."

Petr points out [1] that calling trace_console_rcuidle() in
call_console_driver() had been the wrong thing for a while, because
"printk() always used console_trylock() and the message was flushed to
the console only when the trylock succeeded. And it was always deferred
in NMI or when printed via printk_deferred()."

With the commit 09c5ba0aa2 ("printk: add kthread console printers"),
things only got worse, and calls to call_console_driver() no longer
happen with typical printk() calls but always appear deferred [2].

As such, the tracepoint can no longer serve its purpose to clearly
correlate printk() calls and other tracing, as well as breaks usecases
that expect every printk() call to result in a callback of the console
tracepoint. Notably, the KFENCE and KCSAN test suites, which want to
capture console output and assume a printk() immediately gives us a
callback to the console tracepoint.

Fix the console tracepoint by moving it into printk_sprint() [3].

One notable difference is that by moving tracing into printk_sprint(),
the 'text' will no longer include the "header" (loglevel and timestamp),
but only the raw message. Arguably this is less of a problem now that
the console tracepoint happens on the printk() call and isn't delayed.

Link: https://lore.kernel.org/all/Ym+WqKStCg%2FEHfh3@alley/ [1]
Link: https://lore.kernel.org/all/CA+G9fYu2kS0wR4WqMRsj2rePKV9XLgOU1PiXnMvpT+Z=c2ucHA@mail.gmail.com/ [2]
Link: https://lore.kernel.org/all/87fslup9dx.fsf@jogness.linutronix.de/ [3]
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Marco Elver <elver@google.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220503073844.4148944-1-elver@google.com
2022-05-06 10:55:24 +02:00
John Ogness
ab406816fc printk: remove @console_locked
The static global variable @console_locked is used to help debug
VT code to make sure that certain code paths are running with
the console_lock held. However, this information is also available
with the static global variable @console_kthreads_blocked (for
locking via console_lock()), and the static global variable
@console_kthreads_active (for locking via console_trylock()).

Remove @console_locked and update is_console_locked() to use the
alternative variables.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-16-john.ogness@linutronix.de
2022-04-26 14:33:59 +02:00
John Ogness
8e27473211 printk: extend console_lock for per-console locking
Currently threaded console printers synchronize against each
other using console_lock(). However, different console drivers
are unrelated and do not require any synchronization between
each other. Removing the synchronization between the threaded
console printers will allow each console to print at its own
speed.

But the threaded consoles printers do still need to synchronize
against console_lock() callers. Introduce a per-console mutex
and a new console boolean field @blocked to provide this
synchronization.

console_lock() is modified so that it must acquire the mutex
of each console in order to set the @blocked field. Console
printing threads will acquire their mutex while printing a
record. If @blocked was set, the thread will go back to sleep
instead of printing.

The reason for the @blocked boolean field is so that
console_lock() callers do not need to acquire multiple console
mutexes simultaneously, which would introduce unnecessary
complexity due to nested mutex locking. Also, a new field
was chosen instead of adding a new @flags value so that the
blocked status could be checked without concern of reading
inconsistent values due to @flags updates from other contexts.

Threaded console printers also need to synchronize against
console_trylock() callers. Since console_trylock() may be
called from any context, the per-console mutex cannot be used
for this synchronization. (mutex_trylock() cannot be called
from atomic contexts.) Introduce a global atomic counter to
identify if any threaded printers are active. The threaded
printers will also check the atomic counter to identify if the
console has been locked by another task via console_trylock().

Note that @console_sem is still used to provide synchronization
between console_lock() and console_trylock() callers.

A locking overview for console_lock(), console_trylock(), and the
threaded printers is as follows (pseudo code):

console_lock()
{
        down(&console_sem);
        for_each_console(con) {
                mutex_lock(&con->lock);
                con->blocked = true;
                mutex_unlock(&con->lock);
        }
        /* console_lock acquired */
}

console_trylock()
{
        if (down_trylock(&console_sem) == 0) {
                if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) {
                        /* console_lock acquired */
                }
        }
}

threaded_printer()
{
        mutex_lock(&con->lock);
        if (!con->blocked) {
		/* console_lock() callers blocked */

                if (atomic_inc_unless_negative(&console_kthreads_active)) {
                        /* console_trylock() callers blocked */

                        con->write();

                        atomic_dec(&console_lock_count);
                }
        }
        mutex_unlock(&con->lock);
}

The console owner and waiter logic now only applies between contexts
that have taken the console_lock via console_trylock(). Threaded
printers never take the console_lock, so they do not have a
console_lock to handover. Tasks that have used console_lock() will
block the threaded printers using a mutex and if the console_lock
is handed over to an atomic context, it would be unable to unblock
the threaded printers. However, the console_trylock() case is
really the only scenario that is interesting for handovers anyway.

@panic_console_dropped must change to atomic_t since it is no longer
protected exclusively by the console_lock.

Since threaded printers remain asleep if they see that the console
is locked, they now must be explicitly woken in __console_unlock().
This means wake_up_klogd() calls following a console_unlock() are
no longer necessary and are removed.

Also note that threaded printers no longer need to check
@console_suspended. The check for the @blocked field implicitly
covers the suspended console case.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/878rrs6ft7.fsf@jogness.linutronix.de
2022-04-26 14:32:00 +02:00
John Ogness
09c5ba0aa2 printk: add kthread console printers
Create a kthread for each console to perform console printing. During
normal operation (@system_state == SYSTEM_RUNNING), the kthread
printers are responsible for all printing on their respective
consoles.

During non-normal operation, console printing is done as it has been:
within the context of the printk caller or within irqwork triggered
by the printk caller, referred to as direct printing.

Since threaded console printers are responsible for all printing
during normal operation, this also includes messages generated via
deferred printk calls. If direct printing is in effect during a
deferred printk call, the queued irqwork will perform the direct
printing. To make it clear that this is the only time that the
irqwork will perform direct printing, rename the flag
PRINTK_PENDING_OUTPUT to PRINTK_PENDING_DIRECT_OUTPUT.

Threaded console printers synchronize against each other and against
console lockers by taking the console lock for each message that is
printed.

Note that the kthread printers do not care about direct printing.
They will always try to print if new records are available. They can
be blocked by direct printing, but will be woken again once direct
printing is finished.

Console unregistration is a bit tricky because the associated
kthread printer cannot be stopped while the console lock is held.
A policy is implemented that states: whichever task clears
con->thread (under the console lock) is responsible for stopping
the kthread. unregister_console() will clear con->thread while
the console lock is held and then stop the kthread after releasing
the console lock.

For consoles that have implemented the exit() callback, the kthread
is stopped before exit() is called.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-14-john.ogness@linutronix.de
2022-04-22 21:30:58 +02:00
John Ogness
2bb2b7b57f printk: add functions to prefer direct printing
Once kthread printing is available, console printing will no longer
occur in the context of the printk caller. However, there are some
special contexts where it is desirable for the printk caller to
directly print out kernel messages. Using pr_flush() to wait for
threaded printers is only possible if the caller is in a sleepable
context and the kthreads are active. That is not always the case.

Introduce printk_prefer_direct_enter() and printk_prefer_direct_exit()
functions to explicitly (and globally) activate/deactivate preferred
direct console printing. The term "direct console printing" refers to
printing to all enabled consoles from the context of the printk
caller. The term "prefer" is used because this type of printing is
only best effort. If the console is currently locked or other
printers are already actively printing, the printk caller will need
to rely on the other contexts to handle the printing.

This preferred direct printing is how all printing has been handled
until now (unless it was explicitly deferred).

When kthread printing is introduced, there may be some unanticipated
problems due to kthreads being unable to flush important messages.
In order to minimize such risks, preferred direct printing is
activated for the primary important messages when the system
experiences general types of major errors. These are:

 - emergency reboot/shutdown
 - cpu and rcu stalls
 - hard and soft lockups
 - hung tasks
 - warn
 - sysrq

Note that since kthread printing does not yet exist, no behavior
changes result from this commit. This is only implementing the
counter and marking the various places where preferred direct
printing is active.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-13-john.ogness@linutronix.de
2022-04-22 21:30:58 +02:00
John Ogness
3b604ca812 printk: add pr_flush()
Provide a might-sleep function to allow waiting for console printers
to catch up to the latest logged message.

Use pr_flush() whenever it is desirable to get buffered messages
printed before continuing: suspend_console(), resume_console(),
console_stop(), console_start(), console_unblank().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-12-john.ogness@linutronix.de
2022-04-22 21:30:58 +02:00
John Ogness
03a749e628 printk: move buffer definitions into console_emit_next_record() caller
Extended consoles print extended messages and do not print messages about
dropped records.

Non-extended consoles print "normal" messages as well as extra messages
about dropped records.

Currently the buffers for these various message types are defined within
the functions that might use them and their usage is based upon the
CON_EXTENDED flag. This will be a problem when moving to kthread printers
because each printer must be able to provide its own buffers.

Move all the message buffer definitions outside of
console_emit_next_record(). The caller knows if extended or dropped
messages should be printed and can specify the appropriate buffers to
use. The console_emit_next_record() and call_console_driver() functions
can know what to print based on whether specified buffers are non-NULL.

With this change, buffer definition/allocation/specification is separated
from the code that does the various types of string printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-11-john.ogness@linutronix.de
2022-04-22 21:30:58 +02:00
John Ogness
a699449bb1 printk: refactor and rework printing logic
Refactor/rework printing logic in order to prepare for moving to
threaded console printing.

- Move @console_seq into struct console so that the current
  "position" of each console can be tracked individually.

- Move @console_dropped into struct console so that the current drop
  count of each console can be tracked individually.

- Modify printing logic so that each console independently loads,
  prepares, and prints its next record.

- Remove exclusive_console logic. Since console positions are
  handled independently, replaying past records occurs naturally.

- Update the comments explaining why preemption is disabled while
  printing from printk() context.

With these changes, there is a change in behavior: the console
replaying the log (formerly exclusive console) will no longer block
other consoles. New messages appear on the other consoles while the
newly added console is still replaying.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-10-john.ogness@linutronix.de
2022-04-22 21:30:57 +02:00
John Ogness
1fc0ca9e0d printk: add con_printk() macro for console details
It is useful to generate log messages that include details about
the related console. Rather than duplicate the code to assemble
the details, put that code into a macro con_printk().

Once console printers become threaded, this macro will find more
users.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-9-john.ogness@linutronix.de
2022-04-22 21:30:57 +02:00
John Ogness
1f47e8af45 printk: call boot_delay_msec() in printk_delay()
boot_delay_msec() is always called immediately before printk_delay()
so just call it from within printk_delay().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-8-john.ogness@linutronix.de
2022-04-22 21:30:57 +02:00
John Ogness
9f0844de49 printk: get caller_id/timestamp after migration disable
Currently the local CPU timestamp and caller_id for the record are
collected while migration is enabled. Since this information is
CPU-specific, it should be collected with migration disabled.

Migration is disabled immediately after collecting this information
anyway, so just move the information collection to after the
migration disabling.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-7-john.ogness@linutronix.de
2022-04-22 21:30:57 +02:00
John Ogness
5341b93dea printk: wake waiters for safe and NMI contexts
When printk() is called from safe or NMI contexts, it will directly
store the record (vprintk_store()) and then defer the console output.
However, defer_console_output() only causes console printing and does
not wake any waiters of new records.

Wake waiters from defer_console_output() so that they also are aware
of the new records from safe and NMI contexts.

Fixes: 03fc7f9c99 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-6-john.ogness@linutronix.de
2022-04-22 21:30:57 +02:00
John Ogness
938ba4084a printk: wake up all waiters
There can be multiple tasks waiting for new records. They should
all be woken. Use wake_up_interruptible_all() instead of
wake_up_interruptible().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-5-john.ogness@linutronix.de
2022-04-22 21:30:57 +02:00