Commit Graph

330 Commits

Author SHA1 Message Date
Xiaoming Ni fdcd4073fc printk: fix build warning when CONFIG_PRINTK=n
build warning when CONFIG_PRINTK=n

	kernel/printk/printk.c:175:5: warning: no previous prototype for
	 'devkmsg_sysctl_set_loglvl' [-Wmissing-prototypes]

devkmsg_sysctl_set_loglvl() is only used in sysctl.c when
CONFIG_PRINTK=y, but it participates in the build when CONFIG_PRINTK=n.
So add compile dependency CONFIG_PRINTK=y && CONFIG_SYSCTL=y to fix the
build warning.

Link: https://lkml.kernel.org/r/20211129211943.640266-5-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Xiaoming Ni faaa357a55 printk: move printk sysctl to printk/sysctl.c
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move printk sysctl from kernel/sysctl.c to kernel/printk/sysctl.c.
Use register_sysctl() to register the sysctl interface.

[mcgrof@kernel.org: fixed compile issues when PRINTK is not set, commit log update]

Link: https://lkml.kernel.org/r/20211124231435.1445213-6-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Petr Mladek 5e8ba485b2 printk/console: Clean up boot console handling in register_console()
The variable @bcon has two meanings. It is used several times for iterating
the list of registered consoles. In the meantime, it holds the information
whether a boot console is first in @console_drivers list.

The information about the 1st console driver used to be important for
the decision whether to install the new console by default or not.
It allowed to re-evaluate the variable @need_default_console when
a real console with tty binding has been unregistered in the meantime.

The decision about the default console is not longer affected by @bcon
variable. The current code checks whether the first driver is real
and has tty binding directly.

The information about the first console is still used for two more
decisions:

  1. It prevents duplicate output on non-boot consoles with
     CON_CONSDEV flag set.

  2. Early/boot consoles are unregistered when a real console with
     CON_CONSDEV is registered and @keep_bootcon is not set.

The behavior in the real life is far from obvious. @bcon is set according
to the first console @console_drivers list. But the first position in
the list is special:

  1. Consoles with CON_CONSDEV flag are put at the beginning of
     the list. It is either the preferred console or any console
     with tty binding registered by default.

  2. Another console might become the first in the list when
     the first console in the list is unregistered. It might
     happen either explicitly or automatically when boot
     consoles are unregistered.

There is one more important rule:

  + Boot consoles can't be registered when any real console
    is already registered.

It is a puzzle. The main complication is the dependency on the first
position is the list and the complicated rules around it.

Let's try to make it easier:

1. Add variable @bootcon_enabled and set it by iterating all registered
   consoles. The variable has obvious meaning and more predictable
   behavior. Any speed optimization and other tricks are not worth it.

2. Use a generic name for the variable that is used to iterate
   the list on registered console drivers.

Behavior change:

No, maybe surprisingly, there is _no_ behavior change!

Let's provide the proof by contradiction. Both operations, duplicate
output prevention and boot consoles removal, are done only when
the newly added console has CON_CONSDEV flag set. The behavior
would change when the new @bootcon_enabled has different value
than the original @bcon.

By other words, the behavior would change when the following conditions
are true:

   + a console with CON_CONSDEV flag is added
   + a real (non-boot) console is the first in the list
   + a boot console is later in the list

Now, a real console might be first in the list only when:

   + It was the first registered console. In this case, there can't be
     any boot console because any later ones were rejected.

   + It was put at the first position because it had CON_CONSDEV flag
     set. It was either the preferred console or it was a console with
     tty binding registered by default. We are interested only in
     a real consoles here. And real console with tty binding fulfills
     conditions of the default console.

     Now, there is always only one console that is either preferred
     or fulfills conditions of the default console. It can't be already
     in the list and being registered at the same time.

As a result, the above three conditions could newer be "true" at
the same time. Therefore the behavior can't change.

Final dilemma:

OK, the new code has the same behavior. But is the change in the right
direction? What if the handling of @console_drivers is updated in
the future?

OK, let's look at it from another angle:

1. The ordering of @console_drivers list is important only in
   console_device() function. The first console driver with tty
   binding gets associated with /dev/console.

2. CON_CONSDEV flag is shown in /proc/consoles. And it should be set
   for the driver that is returned by console_device().

3. A boot console is removed and the duplicated output is prevented
   when the real console with CON_CONSDEV flag is registered.

Now, in the ideal world:

+ The driver associated with /dev/console should be either a console
  preferred via the command line, device tree, or SPCR. Or it should
  be the first real console with tty binding registered by default.

+ The code should match the related boot and real console drivers.
  It should unregister only the obsolete boot driver. And the duplicated
  output should be prevented only on the related real driver.

It is clear that it is not guaranteed by the current code. Instead,
the current code looks like a maze of heuristics that try to achieve
the above.

It is result of adding several features over last few decades. For example,
a possibility to register more consoles, unregister consoles, boot
consoles, consoles without tty binding, device tree, SPCR, braille
consoles.

Anyway, there is no reason why the decision, about removing boot consoles
and preventing duplicated output, should depend on the first console
in the list. The current code does the decisions primary by CON_CONSDEV
flag that is used for the preferred console. It looks like a
good compromise. And the change seems to be in the right direction.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-6-pmladek@suse.com
2021-12-06 14:07:57 +01:00
Petr Mladek 4f54693925 printk/console: Remove need_default_console variable
The variable @need_default_console is used to decide whether a newly
registered console should get enabled by default.

The logic is complicated. It can be modified in a register_console()
call. But it is always re-evaluated in the next call by the following
condition:

	if (need_default_console || bcon || !console_drivers)
		need_default_console = preferred_console < 0;

In short, the value is updated when either of the condition is valid:

  + the value is still, or again, "true"
  + boot/early console is still the first in @console_driver list
  + @console_driver list is empty

The value is updated according to @preferred_console. In particular,
it is set to "false" when a @preferred_console was set by
__add_preferred_console(). This happens when a non-braille console
was added via the command line, device tree, or SPCR.

It far from clear what this all means together. Let's look at
@need_default_console from another angle:

1. The value is "true" by default. It means that it is always set
   according to @preferred_console during the first register_console()
   call.

   By other words, the first register_console() call will register
   the console by default only when none non-braille console was defined
   via the command line, device tree, or SPCR.

2. The value will always stay "false" when @preferred_console is set.

   By other words, try_enable_default_console() will never get called
   when a non-braille console is explicitly required.

4. The value might be set to "false" in try_enable_default_console()
   when a console with tty binding (driver) gets enabled.

   In this case CON_CONSDEV is set as well. It causes that the console
   will be inserted as first into the list @console_driver. It might
   be either real or boot/early console.

5. The value will be set _back_ to "true" in the next register_console()
   call when:

      + The console added by the previous register_console() had been
	a boot/early one.

      + The last console has been unregistered in the meantime and
	a boot/early console became first in @console_drivers list
	again. Or the list became empty.

   By other words, the value will stay "false" only when the last
   registered console was real, had tty binding, and was not removed
   in the mean time.

The main logic looks clear:

  + Consoles are enabled by default only when no one is preferred
    via the command line, device tree, or SPCR.

  + By default, any console is enabled until a real console
    with tty binding gets registered.

The behavior when the real console with tty binding is later removed
is a bit unclear:

  + By default, any new console is registered again only when there
    is no console or the first console in the list is a boot one.

The question is why the code is suddenly happy when a real console
without tty binding is the first in the list. It looks like an overlook
and bug.

Conclusion:

The state of @preferred_console and the first console in @console_driver
list should be enough to decide whether we need to enable the given console
by default.

The rules are simple. New consoles are _not_ enabled by default
when either of the following conditions is true:

  + @preferred_console is set. It means that a non-braille console
    is explicitly configured via the command line, device tree, or SPCR.

  + A real console with tty binding is registered. Such a console will
    have CON_CONSDEV flag set and will always be the first in
    @console_drivers list.

Note:

The new code does not use @bcon variable. The meaning of the variable
is far from clear. The direct check of the first console in the list
makes it more clear that only real console fulfills requirements
of the default console.

Behavior change:

As already discussed above. There was one situation where the original
code worked a strange way. Let's have:

	+ console A: real console without tty binding
	+ console B: real console with tty binding

and do:

	register_console(A);	/* 1st step */
	register_console(B);	/* 2nd step */
	unregister_console(B);	/* 3rd step */
	register_console(B);	/* 4th step */

The original code will not register the console B in the 4th step.
@need_default_console is set to "false" in 2nd step. The real console
with tty binding (driver) is then removed in the 3rd step.
But @need_default_console will stay "false" in the 4th step because
there is no boot/early console and @registered_consoles list is not
empty.

The new code will register the console B in the 4th step because
it checks whether the first console has tty binding (->driver)

This behavior change should acceptable:

  1. The scenario requires manual intervention (console removal).
     The system should boot with the same consoles as before.

  2. Console B is registered again probably because the user wants
     to use it. The most likely scenario is that the related
     module is reloaded.

  3. It makes the behavior more consistent and predictable.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-5-pmladek@suse.com
2021-12-06 14:07:57 +01:00
Petr Mladek f873efe841 printk/console: Remove unnecessary need_default_console manipulation
There is no need to clear @need_default_console when a console
preferred by the command line, device tree, or SPCR, gets enabled.

The code is called only when some non-braille console matched a console
in @console_cmdline array. It means that a non-braille console was added
in __add_preferred_console() and the variable preferred_console is set
to a number >= 0. As a result, @need_default_console is always set to
"false" in the magic condition:

	if (need_default_console || bcon || !console_drivers)
		need_default_console = preferred_console < 0;

This is one small step in removing the above magic condition
that is hard to follow.

The patch removes one superfluous assignment and should not change
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211122132649.12737-4-pmladek@suse.com
2021-12-06 14:07:57 +01:00
Petr Mladek a6953370d2 printk/console: Rename has_preferred_console to need_default_console
The logic around the variable @has_preferred_console made my head
spin many times. Part of the problem is the ambiguous name.

There is the variable @preferred_console. It points to the last
non-braille console in @console_cmdline array. This array contains
consoles preferred via the command line, device tree, or SPCR.

Then there is the variable @has_preferred_console. It is set to
"true" when @preferred_console is enabled or when a console with
tty binding gets enabled by default.

It might get reset back by the magic condition:

	if (!has_preferred_console || bcon || !console_drivers)
		has_preferred_console = preferred_console >= 0;

It is a puzzle. Dumb explanation is that it gets re-evaluated
when:

	+ it was not set before (see above when it gets set)
	+ there is still an early console enabled (bcon)
	+ there is no console enabled (!console_drivers)

This is still a puzzle.

It gets more clear when we see where the value is checked. The only
meaning of the variable is to decide whether we should try to enable
the new console by default.

Rename the variable according to the single situation where
the value is checked.

The rename requires an inverted logic. Otherwise, it is a simple
search & replace. It does not change the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Link: https://lore.kernel.org/r/20211122132649.12737-3-pmladek@suse.com
2021-12-06 14:07:57 +01:00
Petr Mladek ed758b30d5 printk/console: Split out code that enables default console
Put the code enabling a console by default into a separate function
called try_enable_default_console().

Rename try_enable_new_console() to try_enable_preferred_console() to
make the purpose of the different variants more clear.

It is a code refactoring without any functional change.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Link: https://lore.kernel.org/r/20211122132649.12737-2-pmladek@suse.com
2021-12-06 14:07:57 +01:00
Linus Torvalds 7d5775d49e printk fixup for 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmGWF6YACgkQUqAMR0iA
 lPJJ+RAAm9pi/EElKKl+lOlBl+ehJlKuNLnPQWFmmaRc9xd0ruUipp1nsaktLJ8f
 R/PkSwR/YWpBWlF8P4o+x9sOFyTNyLasoHtqsinEcAJI4lb7d1KOrPliTXyr15Ai
 A303djwJmwCw5KxAPOjkG/nMBlpMIAQRee9GDWs1ykfSlIsI4jp7isVbCFNCQNVF
 auHYq1bfJ5MJYPjxIDZUt+NF7kg7dD4k4g+VCVjaH1u8pGeaCUCtnNjMFOk1XfU8
 yFQnaDcrAu4zJPq3d74z4eN9Bk+su8+DhnfrAEFjuFxGTgYc2MyRt0gGFeiUtNs4
 rvST/eHBO4zeuL18S8G+fLcig/9ZYE73xzjdOCzRzLDjn0VQr9t06ez1QqJOb4D6
 A4SSufwek5NIqYKMlhV/az2EceQYK8Wv3KAz8w98KDfwvVVhUSgE23MbTCO0hvQU
 PWF35d3hQ+9oH0ZGYRumb8OpXtKJ+2KmzyN8Z0xhivHFBIKlcW6IBGhWRANclJO8
 jNAM3jiwi8fRDVM2wI1fmgeEmMhG+WuTI3dJVu3tu4vI923FW5GdY6ev5EvH0Ts0
 khTwIjtmCHUJGSeWajy3Gi9irdyhPyPNRMqgal4GS+gGpVU2mMMKTG+NzxxtCRKR
 BUgfCjFDoDJWrNWIzzOwNqgF0Y+V9GgCZOkb73u/y+xVx0Rmc6U=
 =wbBy
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Try to flush backtraces from other CPUs also on the local one. This
   was a regression caused by printk_safe buffers removal.

 - Remove header dependency warning.

* tag 'printk-for-5.16-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: Remove printk.h inclusion in percpu.h
  printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
2021-11-18 10:50:45 -08:00
Petr Mladek bf6d0d1e1a Merge branch 'rework/printk_safe-removal' into for-linus 2021-11-18 10:03:47 +01:00
Nicholas Piggin 5d5e4522a7 printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
printk from NMI context relies on irq work being raised on the local CPU
to print to console. This can be a problem if the NMI was raised by a
lockup detector to print lockup stack and regs, because the CPU may not
enable irqs (because it is locked up).

Introduce printk_trigger_flush() that can be called another CPU to try
to get those messages to the console, call that where printk_safe_flush
was previously called.

Fixes: 93d102f094 ("printk: remove safe buffers")
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211107045116.1754411-1-npiggin@gmail.com
2021-11-10 16:12:00 +01:00
Linus Torvalds 512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
Mike Rapoport 4421cca0a3 memblock: use memblock_free for freeing virtual pointers
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()

The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.

    @@
    identifier vaddr;
    expression size;
    @@
    (
    - memblock_phys_free(__pa(vaddr), size);
    + memblock_free(vaddr, size);
    |
    - memblock_free_ptr(vaddr, size);
    + memblock_free(vaddr, size);
    )

[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au

Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Linus Torvalds 0aaa58eca6 printk changes for 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmGBCBkACgkQUqAMR0iA
 lPLMdg/6Ag9V5Q6DPvbYe0WK8wfrrRL39Eic+K6wrYBVK/8rvMUy4Oee5tyOqCz7
 z9GM+SivWRtEdEy8X/HzoawMQEuy3jLcaFoCNxHcScmc6R5Sd8otxPU5Lo8aZPLN
 Pulni9EprysI2zhLqq5m6o/F9pMOY0y8uKbD1mgIHEV9yoLan+CZ+vahf/eFwYQu
 NtYlMoK2KbS2mChGOZuLsthhyNxcCNFWWNwpBBQz7iJ9ZvnKCZ3EwG7Nx34Rx7ZE
 TYZ2iga3TTONsoCk0IClbA6zRIowgumKQl9aY9Oci1MXdIEug42i0GEl+p4iCkrH
 VhLyPsvJG6xyE6aCg/p2SB1vPasY+pp94VfTjFfmMulYdUHK7ipfZCR3ddxayR4B
 PEsITibo/hHYEVerMMSyVXttiPS7qFhIyZkNuX/xpCMLz8RSFjgU5QhR848A4scM
 r+qv1p7xkdBRvH3jlStrpLRnGtqOucvbNQgyvQiinm0yunpJN8FZgEsHnP60E5+j
 DLpQF/bK2h7PhE2Wy8/iINi49/dZiIldZ1gZV4BxjuJ5zwSLdiuR9aP51RK4IRhV
 qraLwU6yNv0k4v6sjXV78inQQ2vkqy/MBYMe3zqnpYbJB2DZYCbeRE62whrdEd4W
 wxHxiY7r9dR6gtJB52kGepbryd3JIMdI49oFRjvGi2shaXG1AZ0=
 =t12m
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Extend %pGp print format to print hex value of the page flags

 - Use kvmalloc instead of kmalloc to allocate devkmsg buffers

 - Misc cleanup and warning fixes

* tag 'printk-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsprintf: Update %pGp documentation about that it prints hex value
  lib/vsprintf.c: Amend static asserts for format specifier flags
  vsprintf: Make %pGp print the hex value
  test_printf: Append strings more efficiently
  test_printf: Remove custom appending of '|'
  test_printf: Remove separate page_flags variable
  test_printf: Make pft array const
  ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
  printk: use gnu_printf format attribute for printk_sprint()
  printk: avoid -Wsometimes-uninitialized warning
  printk: use kvmalloc instead of kmalloc for devkmsg_user
2021-11-02 10:53:45 -07:00
John Ogness 264a750472 printk: use gnu_printf format attribute for printk_sprint()
Fix the following W=1 kernel build warning:

kernel/printk/printk.c: In function 'printk_sprint':
kernel/printk/printk.c:1913:9: warning: function 'printk_sprint' might be
a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210927142203.124730-1-john.ogness@linutronix.de
2021-10-04 10:35:13 +02:00
Linus Torvalds 77e02cf57b memblock: introduce saner 'memblock_free_ptr()' interface
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.

Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:

   https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/

or just random memory corruption if the debug checks don't catch it:

   https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/

I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.

I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.

So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer.  And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-14 13:23:22 -07:00
Yong-Taek Lee 9980c4251f printk: use kvmalloc instead of kmalloc for devkmsg_user
Size of struct devkmsg_user increased to 16784 by commit 896fbe20b4
("printk: use the lockless ringbuffer") so order3(32kb) is needed for
kmalloc. Under stress conditions the kernel may temporary fail to
allocate 32k with kmalloc. Use kvmalloc instead of kmalloc to aviod
this issue.

qseecomd invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=3, oom_score_adj=-1000
Call trace:
 dump_backtrace+0x0/0x34c
 dump_stack_lvl+0xd4/0x16c
 dump_header+0x5c/0x338
 out_of_memory+0x374/0x4cc
 __alloc_pages_slowpath+0xbc8/0x1130
 __alloc_pages_nodemask+0x170/0x1b0
 kmalloc_order+0x5c/0x24c
 devkmsg_open+0x1f4/0x558
 memory_open+0x94/0xf0
 chrdev_open+0x288/0x3dc
 do_dentry_open+0x2b4/0x618
 path_openat+0xce4/0xfa8
 do_filp_open+0xb0/0x164
 do_sys_openat2+0xa8/0x264
 __arm64_sys_openat+0x70/0xa0
 el0_svc_common+0xc4/0x270
 el0_svc+0x34/0x9c
 el0_sync_handler+0x88/0xf0
 el0_sync+0x1bc/0x200

 DMA32: 4521*4kB (UMEC) 1377*8kB (UMECH) 73*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 30268kB
 Normal: 2490*4kB (UMEH) 277*8kB (UMH) 27*16kB (UH) 1*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12640kB

Signed-off-by: Yong-Taek Lee <ytk.lee@samsung.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210830071701epcms1p70f72ae10940bc407a3c33746d20da771@epcms1p7
2021-09-07 14:57:01 +02:00
Petr Mladek c985aafb60 Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
Petr Mladek 715d3edb79 Merge branch 'rework/fixup-for-5.15' into for-linus 2021-08-30 16:33:04 +02:00
Petr Mladek baa99c9267 Merge branch 'for-5.15-verbose-console' into for-linus 2021-08-30 14:56:28 +02:00
Dmitry Safonov 10102a890b printk: Add printk.console_no_auto_verbose boot parameter
console_verbose() increases console loglevel to
CONSOLE_LOGLEVEL_MOTORMOUTH, which provides more information
to debug a panic/oops.

Unfortunately, in Arista we maintain some DUTs (Device Under Test) that
are configured to have 9600 baud rate. While verbose console messages
have their value to post-analyze crashes, on such setup they:
- may prevent panic/oops messages being printed
- take too long to flush on console resulting in watchdog reboot

In all our setups we use kdump which saves dmesg buffer after panic,
so in reality those extra messages on console provide no additional value,
but rather add risk of not getting to __crash_kexec().

Provide printk.console_no_auto_verbose boot parameter, which allows
to switch off printk being verbose on oops/panic/lockdep.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210727130635.675184-3-dima@arista.com
2021-07-29 16:29:35 +02:00
John Ogness 8d909b2333 printk: syslog: close window between wait and read
Syslog's SYSLOG_ACTION_READ is supposed to block until the next
syslog record can be read, and then it should read that record.
However, because @syslog_lock is not held between waking up and
reading the record, another reader could read the record first,
thus causing SYSLOG_ACTION_READ to return with a value of 0, never
having read _anything_.

By holding @syslog_lock between waking up and reading, it can be
guaranteed that SYSLOG_ACTION_READ blocks until it successfully
reads a syslog record (or a real error occurs).

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-7-john.ogness@linutronix.de
2021-07-26 15:09:57 +02:00
John Ogness b371cbb584 printk: convert @syslog_lock to mutex
@syslog_lock was a raw_spin_lock to simplify the transition of
removing @logbuf_lock and the safe buffers. With that transition
complete, and since all uses of @syslog_lock are within sleepable
contexts, @syslog_lock can become a mutex.

Note that until now register_console() would disable interrupts
using irqsave, which implies that it may be called with interrupts
disabled. And indeed, there is one possible call chain on parisc
where this happens:

handle_interruption(code=1) /* High-priority machine check (HPMC) */
  pdc_console_restart()
    pdc_console_init_force()
      register_console()

However, register_console() calls console_lock(), which might sleep.
So it has never been allowed to call register_console() from an
atomic context and the above call chain is a bug.

Note that the removal of read_syslog_seq_irq() is slightly changing
the behavior of SYSLOG_ACTION_READ by testing against a possibly
outdated @seq value. However, the value of @seq could have changed
after the test, so it is not a new window. A follow-up commit closes
this window.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-6-john.ogness@linutronix.de
2021-07-26 15:09:50 +02:00
John Ogness 93d102f094 printk: remove safe buffers
With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console and console_owner locks is left
in place. This is because the console and console_owner locks are
needed for the actual printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-4-john.ogness@linutronix.de
2021-07-26 15:09:34 +02:00
John Ogness 002eb6ad07 printk: track/limit recursion
Currently the printk safe buffers provide a form of recursion
protection by redirecting to the safe buffers whenever printk() is
recursively called.

In preparation for removal of the safe buffers, provide an alternate
explicit recursion protection. Recursion is limited to 3 levels
per-CPU and per-context.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-3-john.ogness@linutronix.de
2021-07-26 15:07:15 +02:00
Jonathan Corbet 7d9e2661f2 printk: Move the printk() kerneldoc comment to its new home
Commit 3370155737 ("printk: Userspace format indexing support") turned
printk() into a macro, but left the kerneldoc comment for it with the (now)
_printk() function, resulting in this docs-build warning:

  kernel/printk/printk.c:1: warning: 'printk' not found

Move the kerneldoc comment back next to the (now) macro it's meant to
describe and have the docs build find it there.

Fixes: 3370155737 ("printk: Userspace format indexing support")
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/87o8aqt7qn.fsf@meer.lwn.net
2021-07-26 12:36:44 +02:00
Chris Down 3370155737 printk: Userspace format indexing support
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:

1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
   remediation for the machine), rinse, and repeat.

As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.

While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.

One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.

This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.

Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.

This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:

    $ head -1 vmlinux; shuf -n 5 vmlinux
    # <level[,flags]> filename:line function "format"
    <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
    <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
    <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
    <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
    <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"

This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.

There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.

Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
2021-07-19 11:57:48 +02:00
Chris Down f3d75cf537 printk: Rework parse_prefix into printk_parse_prefix
parse_prefix is needed externally by later patches, so move it into a
context where it can be used as such. Also give it the printk_ prefix to
reduce the chance of collisions.

Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/b22ba314a860e5c7f887958f1eab2649f9bd1d06.1623775748.git.chris@chrisdown.name
2021-07-19 11:57:18 +02:00
Chris Down a1ad4b8a19 printk: Straighten out log_flags into printk_info_flags
In the past, `enum log_flags` was part of `struct log`, hence the name.
`struct log` has since been reworked and now this struct is stored
inside `struct printk_info`. However, the name was never updated, which
is somewhat confusing -- especially since these flags operate at the
record level rather than at the level of an abstract log.

printk_info_flags also joins its other metadata struct friends in
printk_ringbuffer.h.

Signed-off-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/3dd801982f02603e6e3aa4f8bc4f5ebb830a4949.1623775748.git.chris@chrisdown.name
2021-07-19 11:56:40 +02:00
Petr Mladek 11e4b63abb printk/console: Check consistent sequence number when handling race in console_unlock()
The standard printk() tries to flush the message to the console
immediately. It tries to take the console lock. If the lock is
already taken then the current owner is responsible for flushing
even the new message.

There is a small race window between checking whether a new message is
available and releasing the console lock. It is solved by re-checking
the state after releasing the console lock. If the check is positive
then console_unlock() tries to take the lock again and process the new
message as well.

The commit 996e966640 ("printk: remove logbuf_lock") causes that
console_seq is not longer read atomically. As a result, the re-check might
be done with an inconsistent 64-bit index.

Solve it by using the last sequence number that has been checked under
the console lock. In the worst case, it will take the lock again only
to realized that the new message has already been proceed. But it
was possible even before.

The variable next_seq is marked as __maybe_unused to call down compiler
warning when CONFIG_PRINTK is not defined.

Fixes: commit 996e966640 ("printk: remove logbuf_lock")
Reported-by: kernel test robot <lkp@intel.com>  # unused next_seq warning
Cc: stable@vger.kernel.org # 5.13
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20210702150657.26760-1-pmladek@suse.com
2021-07-08 18:45:35 +02:00
John Ogness 3342aa8e6b printk: fix cpu lock ordering
The cpu lock implementation uses a full memory barrier to take
the lock, but no memory barriers when releasing the lock. This
means that changes performed by a lock owner may not be seen by
the next lock owner. This may have been "good enough" for use
by dump_stack() as a serialization mechanism, but it is not
enough to provide proper protection for a critical section.

Correct this problem by using acquire/release memory barriers
for lock/unlock, respectively.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210617095051.4808-3-john.ogness@linutronix.de
2021-06-22 09:57:15 +02:00
John Ogness 766c268bc6 lib/dump_stack: move cpu lock to printk.c
dump_stack() implements its own cpu-reentrant spinning lock to
best-effort serialize stack traces in the printk log. However,
there are other functions (such as show_regs()) that can also
benefit from this serialization.

Move the cpu-reentrant spinning lock (cpu lock) into new helper
functions printk_cpu_lock_irqsave()/printk_cpu_unlock_irqrestore()
so that it is available for others as well. For !CONFIG_SMP the
cpu lock is a NOP.

Note that having multiple cpu locks in the system can easily
lead to deadlock. Code needing a cpu lock should use the
printk cpu lock, since the printk cpu lock could be acquired
from any code and any context.

Also note that it is not necessary for a cpu lock to disable
interrupts. However, in upcoming work this cpu lock will be used
for emergency tasks (for example, atomic consoles during kernel
crashes) and any interruptions while holding the cpu lock should
be avoided if possible.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.com: Backported on top of 5.13-rc1.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210617095051.4808-2-john.ogness@linutronix.de
2021-06-22 09:56:10 +02:00
Bhaskar Chowdhury acebb5597f kernel/printk.c: Fixed mundane typos
s/sempahore/semaphore/
s/exacly/exactly/
s/unregistred/unregistered/
s/interation/iteration/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.com: Removed 4th hunk. The string has already been removed in the meantime.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210328043932.8310-1-unixbhaskar@gmail.com
2021-03-30 15:34:17 +02:00
Rasmus Villemoes 28e1745b9f printk: rename vprintk_func to vprintk
The printk code is already hard enough to understand. Remove an
unnecessary indirection by renaming vprintk_func to vprintk (adding
the asmlinkage annotation), and removing the vprintk definition from
printk.c. That way, printk is implemented in terms of vprintk as one
would expect, and there's no "vprintk_func, what's that? Some function
pointer that gets set where?"

The declaration of vprintk in linux/printk.h already has the
__printf(1,0) attribute, there's no point repeating that with the
definition - it's for diagnostics in callers.

linux/printk.h already contains a static inline {return 0;} definition
of vprintk when !CONFIG_PRINTK.

Since the corresponding stub definition of vprintk_func was not marked
"static inline", any translation unit including internal.h would get a
definition of vprintk_func - it just so happens that for
!CONFIG_PRINTK, there is precisely one such TU, namely printk.c. Had
there been more, it would be a link error; now it's just a silly waste
of a few bytes of .text, which one must assume are rather precious to
anyone disabling PRINTK.

$ objdump -dr kernel/printk/printk.o
00000330 <vprintk_func>:
 330:   31 c0                   xor    %eax,%eax
 332:   c3                      ret
 333:   8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi
 33a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210323144201.486050-1-linux@rasmusvillemoes.dk
2021-03-30 15:21:18 +02:00
John Ogness 505a27a734 printk: console: remove unnecessary safe buffer usage
Upon registering a console, safe buffers are activated when setting
up the sequence number to replay the log. However, these are already
protected by @console_sem and @syslog_lock. Remove the unnecessary
safe buffer usage.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-16-john.ogness@linutronix.de
2021-03-08 11:43:38 +01:00
John Ogness a4f9876532 printk: kmsg_dump: remove _nolock() variants
kmsg_dump_rewind() and kmsg_dump_get_line() are lockless, so there is
no need for _nolock() variants. Remove these functions and switch all
callers of the _nolock() variants.

The functions without _nolock() were chosen because they are already
exported to kernel modules.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-15-john.ogness@linutronix.de
2021-03-08 11:43:35 +01:00
John Ogness 996e966640 printk: remove logbuf_lock
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.

@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-14-john.ogness@linutronix.de
2021-03-08 11:43:32 +01:00
John Ogness f9f3f02db9 printk: introduce a kmsg_dump iterator
Rather than storing the iterator information in the registered
kmsg_dumper structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller, thus
allowing lockless use of the kmsg_dump functions.

Update code that accesses the kernel logs using the kmsg_dumper
structure to use the new kmsg_dump_iter structure. For kmsg_dumpers,
this also means adding a call to kmsg_dump_rewind() to initialize
the iterator.

All this is in preparation for removal of @logbuf_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org> # pstore
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-13-john.ogness@linutronix.de
2021-03-08 11:43:27 +01:00
John Ogness 5f6c7648e5 printk: kmsg_dumper: remove @active field
All 6 kmsg_dumpers do not benefit from the @active flag:

  (provide their own synchronization)
  - arch/powerpc/kernel/nvram_64.c
  - arch/um/kernel/kmsg_dump.c
  - drivers/mtd/mtdoops.c
  - fs/pstore/platform.c

  (only dump on KMSG_DUMP_PANIC, which does not require
  synchronization)
  - arch/powerpc/platforms/powernv/opal-kmsg.c
  - drivers/hv/vmbus_drv.c

The other 2 kmsg_dump users also do not rely on @active:

  (hard-code @active to always be true)
  - arch/powerpc/xmon/xmon.c
  - kernel/debug/kdb/kdb_main.c

Therefore, @active can be removed.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-12-john.ogness@linutronix.de
2021-03-08 11:43:23 +01:00
John Ogness 636babdc06 printk: add syslog_lock
The global variables @syslog_seq, @syslog_partial, @syslog_time
and write access to @clear_seq are protected by @logbuf_lock.
Once @logbuf_lock is removed, these variables will need their
own synchronization method. Introduce @syslog_lock for this
purpose.

@syslog_lock is a raw_spin_lock for now. This simplifies the
transition to removing @logbuf_lock. Once @logbuf_lock and the
safe buffers are removed, @syslog_lock can change to spin_lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-11-john.ogness@linutronix.de
2021-03-08 11:43:20 +01:00
John Ogness 35b2b16348 printk: use atomic64_t for devkmsg_user.seq
@user->seq is indirectly protected by @logbuf_lock. Once @logbuf_lock
is removed, @user->seq will be no longer safe from an atomicity point
of view.

In preparation for the removal of @logbuf_lock, change it to
atomic64_t to provide this safety.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-10-john.ogness@linutronix.de
2021-03-08 11:43:18 +01:00
John Ogness 7d7a23a91c printk: use seqcount_latch for clear_seq
kmsg_dump_rewind_nolock() locklessly reads @clear_seq. However,
this is not done atomically. Since @clear_seq is 64-bit, this
cannot be an atomic operation for all platforms. Therefore, use
a seqcount_latch to allow readers to always read a consistent
value.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-9-john.ogness@linutronix.de
2021-03-08 11:43:15 +01:00
John Ogness cf5b0208fd printk: introduce CONSOLE_LOG_MAX
Instead of using "LOG_LINE_MAX + PREFIX_MAX" for temporary buffer
sizes, introduce CONSOLE_LOG_MAX. This represents the maximum size
that is allowed to be printed to the console for a single record.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-8-john.ogness@linutronix.de
2021-03-08 11:43:12 +01:00
John Ogness 4260e0e551 printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
The logic for finding records to fit into a buffer is the same for
kmsg_dump_get_buffer() and syslog_print_all(). Introduce a helper
function find_first_fitting_seq() to handle this logic.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-7-john.ogness@linutronix.de
2021-03-08 11:43:09 +01:00
John Ogness 726b509770 printk: refactor kmsg_dump_get_buffer()
kmsg_dump_get_buffer() requires nearly the same logic as
syslog_print_all(), but uses different variable names and
does not make use of the ringbuffer loop macros. Modify
kmsg_dump_get_buffer() so that the implementation is as similar
to syslog_print_all() as possible.

A follow-up commit will move this common logic into a
separate helper function.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-6-john.ogness@linutronix.de
2021-03-08 11:43:05 +01:00
John Ogness bb07b16c44 printk: limit second loop of syslog_print_all
The second loop of syslog_print_all() subtracts lengths that were
added in the first loop. With commit b031a684bf ("printk: remove
logbuf_lock writer-protection of ringbuffer") it is possible that
records are (over)written during syslog_print_all(). This allows the
possibility of the second loop subtracting lengths that were never
added in the first loop.

This situation can result in syslog_print_all() filling the buffer
starting from a later record, even though there may have been room
to fit the earlier record(s) as well.

Fixes: b031a684bf ("printk: remove logbuf_lock writer-protection of ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-4-john.ogness@linutronix.de
2021-03-08 11:42:50 +01:00
Petr Mladek 16182ac1f0 Merge branch 'printk-rework' into for-linus 2021-02-22 13:43:55 +01:00
John Ogness 13791c80b0 printk: avoid prb_first_valid_seq() where possible
If message sizes average larger than expected (more than 32
characters), the data_ring will wrap before the desc_ring. Once the
data_ring wraps, it will start invalidating descriptors. These
invalid descriptors hang around until they are eventually recycled
when the desc_ring wraps. Readers do not care about invalid
descriptors, but they still need to iterate past them. If the
average message size is much larger than 32 characters, then there
will be many invalid descriptors preceding the valid descriptors.

The function prb_first_valid_seq() always begins at the oldest
descriptor and searches for the first valid descriptor. This can
be rather expensive for the above scenario. And, in fact, because
of its heavy usage in /dev/kmsg, there have been reports of long
delays and even RCU stalls.

For code that does not need to search from the oldest record,
replace prb_first_valid_seq() usage with prb_read_valid_*()
functions, which provide a start sequence number to search from.

Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: J. Avila <elavila@google.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210211173152.1629-1-john.ogness@linutronix.de
2021-02-12 17:54:59 +01:00
Petr Mladek 61bb17da44 Merge branch 'printk-rework' into for-linus 2021-01-25 14:29:35 +01:00
John Ogness 08d60e5999 printk: fix string termination for record_print_text()
Commit f0e386ee0c ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.

Use the correct base pointer when adding the terminator.

Fixes: f0e386ee0c ("printk: fix buffer overflow potential for print_text()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.de
2021-01-25 10:37:08 +01:00
Petr Mladek 535b6a122c Merge branch 'printk-rework' into for-linus 2021-01-21 16:06:21 +01:00