Commit Graph

41 Commits

Author SHA1 Message Date
Ross Zwisler 2abfcd293b docs: ftrace: always use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many parts of Documentation still reference this older debugfs path, so
let's update them to avoid confusion.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230125213251.2013791-1-zwisler@google.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-31 14:02:30 -07:00
Christian Brauner ed5a7047d2
attr: use consistent sgid stripping checks
Currently setgid stripping in file_remove_privs()'s should_remove_suid()
helper is inconsistent with other parts of the vfs. Specifically, it only
raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
inode isn't in the caller's groups and the caller isn't privileged over the
inode although we require this already in setattr_prepare() and
setattr_copy() and so all filesystem implement this requirement implicitly
because they have to use setattr_{prepare,copy}() anyway.

But the inconsistency shows up in setgid stripping bugs for overlayfs in
xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
generic/687). For example, we test whether suid and setgid stripping works
correctly when performing various write-like operations as an unprivileged
user (fallocate, reflink, write, etc.):

echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k

The test basically creates a file with 6666 permissions. While the file has
the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
regular filesystem like xfs what will happen is:

sys_fallocate()
-> vfs_fallocate()
   -> xfs_file_fallocate()
      -> file_modified()
         -> __file_remove_privs()
            -> dentry_needs_remove_privs()
               -> should_remove_suid()
            -> __remove_privs()
               newattrs.ia_valid = ATTR_FORCE | kill;
               -> notify_change()
                  -> setattr_copy()

In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.

But we also see that ATTR_KILL_SGID won't be set because while the file
is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.

So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
ATTR_KILL_SUID is set and does:

ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);

which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

Now we call into the filesystem's ->setattr() inode operation which will
end up calling setattr_copy(). Since ATTR_MODE is set we will hit:

if (ia_valid & ATTR_MODE) {
        umode_t mode = attr->ia_mode;
        vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
        if (!vfsgid_in_group_p(vfsgid) &&
            !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                mode &= ~S_ISGID;
        inode->i_mode = mode;
}

and since the caller in the test is neither capable nor in the group of the
inode the S_ISGID bit is stripped.

But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
has the consequence that neither the setgid nor the suid bits are stripped
even though it should be stripped because the inode isn't in the caller's
groups and the caller isn't privileged over the inode.

If overlayfs is in the mix things become a bit more complicated and the bug
shows up more clearly. When e.g., ovl_setattr() is hit from
ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
ATTR_KILL_SGID might be raised but because the check in notify_change() is
questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
stripped the S_ISGID bit isn't removed even though it should be stripped:

sys_fallocate()
-> vfs_fallocate()
   -> ovl_fallocate()
      -> file_remove_privs()
         -> dentry_needs_remove_privs()
            -> should_remove_suid()
         -> __remove_privs()
            newattrs.ia_valid = ATTR_FORCE | kill;
            -> notify_change()
               -> ovl_setattr()
                  // TAKE ON MOUNTER'S CREDS
                  -> ovl_do_notify_change()
                     -> notify_change()
                  // GIVE UP MOUNTER'S CREDS
     // TAKE ON MOUNTER'S CREDS
     -> vfs_fallocate()
        -> xfs_file_fallocate()
           -> file_modified()
              -> __file_remove_privs()
                 -> dentry_needs_remove_privs()
                    -> should_remove_suid()
                 -> __remove_privs()
                    newattrs.ia_valid = attr_force | kill;
                    -> notify_change()

The fix for all of this is to make file_remove_privs()'s
should_remove_suid() helper to perform the same checks as we already
require in setattr_prepare() and setattr_copy() and have notify_change()
not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
first place because the caller must calculate the flags via
should_remove_suid() anyway which would raise ATTR_KILL_SGID.

While we're at it we move should_remove_suid() from inode.c to attr.c
where it belongs with the rest of the iattr helpers. Especially since it
returns ATTR_KILL_S{G,U}ID flags. We also rename it to
setattr_should_drop_suidgid() to better reflect that it indicates both
setuid and setgid bit removal and also that it returns attr flags.

Running xfstests with this doesn't report any regressions. We should really
try and use consistent checks.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-18 10:09:47 +02:00
Leo Yan 9c1ab6d54a docs: ftrace: Correct access mode
The documentation gives an example for opening trace marker with
write-only mode, but the flag WR_ONLY is not defined by glibc.

Use O_WRONLY to replace it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221008083250.3160-1-leo.yan@linaro.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-10 12:50:53 -06:00
Kurt Kanzenbach 4d1257bbc2 tracing: Add documentation for trace clock tai
Add documentation for newly introduced trace clock "tai".
This clock corresponds to CLOCK_TAI.

Link: https://lkml.kernel.org/r/20220414091805.89667-4-kurt@linutronix.de

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-27 17:19:30 -04:00
Huichun Feng cc2cf6796a docs: ftrace: fix ambiguous sentence
The sentence looks ambiguous, rephrase it by adding ", there".

Signed-off-by: Huichun Feng <foxhoundsk.tw@gmail.com>
Signed-off-by: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Signed-off-by: Chun-Hung Tseng <henrybear327@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220111052000.2675944-1-foxhoundsk.tw@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-17 14:56:16 -07:00
Zhaoyu Liu 951e0d0020 docs: ftrace: fix the wrong path of tracefs
Delete "tracing" due to it has been included in /proc/mounts.
Delete "echo nop > $tracefs/tracing/current_tracer", maybe
this command is redundant.

Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-11-15 02:50:39 -07:00
Sebastian Andrzej Siewior c7483d823e Documentation: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Update the documentation accordingly.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-2-bigeasy@linutronix.de
2021-08-28 01:46:17 +02:00
Mauro Carvalho Chehab 81a2d57873 docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
The :doc:`foo` tag is auto-generated via automarkup.py.
So, use the filename at the sources, instead of :doc:`foo`.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/cf9b03ff4b7917d9846503f198372bc6b821445b.1623824363.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-06-17 13:24:39 -06:00
Haocheng Xie 6ad1800071 docs: Fix typos in Documentation/trace/ftrace.rst
Fix the usage of "a/the" and improve the readability.

Signed-off-by: Haocheng Xie <xiehaocheng.cn@gmail.com>
Link: https://lore.kernel.org/r/20210531083905.25763-1-xiehaocheng.cn@gmail.com
[jc: tweaked wording slightly]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-06-01 16:52:28 -06:00
Masami Hiramatsu a345a6718b tracing: Add ptr-hash option to show the hashed pointer value
Add tracefs/options/hash-ptr option to show hashed pointer
value by %p in event printk format string.

For the security reason, normal printk will show the hashed
pointer value (encrypted by random number) with %p to printk
buffer to hide the real address. But the tracefs/trace always
shows real address for debug. To bridge those outputs, add an
option to switch the output format. Ftrace users can use it
to find the hashed value corresponding to the real address
in trace log.

Link: https://lkml.kernel.org/r/160277372504.29307.14909828808982012211.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-11 16:31:57 -05:00
Linus Torvalds 2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Randy Dunlap 1747db54f8 Documentation: trace/ftrace: drop doubled words
Drop the doubled words "for" and "that".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20200703212453.30553-2-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-07-05 14:40:55 -06:00
Manbing 3e28c5ca1d trace doc: correct names of varous tracing documents
Some documents were converted from the plain text documentation
to reStructuredText format; update references to match.

Signed-off-by: Manbing <manbing3@gmail.com>
Link: https://lore.kernel.org/r/20200630185356.3467-1-manbing3@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-07-05 14:08:35 -06:00
Christoph Hellwig ed00aabd5e block: rename generic_make_request to submit_bio_noacct
generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01 07:27:24 -06:00
Eric W. Biederman 2388777a0a exec: Rename flush_old_exec begin_new_exec
There is and has been for a very long time been a lot more going on in
flush_old_exec than just flushing the old state.  After the movement
of code from setup_new_exec there is a whole lot more going on than
just flushing the old executables state.

Rename flush_old_exec to begin_new_exec to more accurately reflect
what this function does.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-07 16:55:47 -05:00
Steven Rostedt (VMware) 2ab2a0924b tracing: Add documentation on set_ftrace_notrace_pid and set_event_notrace_pid
Update the tracing documentation to reflect the new files in the tracing
directory.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:02 -04:00
Steven Rostedt (VMware) 8a815e6b8b tracing: Have the document reflect that the trace file keeps tracing enabled
Now that reading the trace file does not temporarly stop tracing while it is
open, update the document to reflect this fact.

Link: http://lkml.kernel.org/r/20200317213417.209675068@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:02 -04:00
Steven Rostedt (VMware) 06e0a548ba tracing: Do not disable tracing when reading the trace file
When opening the "trace" file, it is no longer necessary to disable tracing.

Note, a new option is created called "pause-on-trace", when set, will cause
the trace file to emulate its original behavior.

Link: http://lkml.kernel.org/r/20200317213416.903351225@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:01 -04:00
Steven Rostedt (VMware) b396bfdebf tracing: Have hwlat ts be first instance and record count of instances
The hwlat tracer runs a loop of width time during a given window. It then
reports the max latency over a given threshold and records a timestamp. But
this timestamp is the time after the width has finished, and not the time it
actually triggered.

Record the actual time when the latency was greater than the threshold as
well as the number of times it was greater in a given width per window.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-03 17:33:43 -05:00
Frank A. Cancio Bello 5b8914a67e docs: ftrace: Fix typos
Fix minor typos in the doc.

Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frank A. Cancio Bello <frank@generalsoftwareinc.com>
Link: https://lore.kernel.org/r/9ef705d0208a4ca0852fed69bc0838a589a4df85.1577231751.git.frank@generalsoftwareinc.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-01-10 10:52:28 -07:00
Frank A. Cancio Bello a65d634e63 docs: ftrace: Clarify the RAM impact of buffer_size_kb
The current text could mislead the user into believing that the number
of pages allocated by each CPU ring buffer is calculated by the round
up of the division: buffer_size_kb / PAGE_SIZE.

Clarifies that a few extra pages may be allocated to accommodate buffer
management meta-data.

Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frank A. Cancio Bello <frank@generalsoftwareinc.com>
Link: https://lore.kernel.org/r/6f33be5f3d60e5ffc061d8d2b329d3d3ccf22a8c.1577231751.git.frank@generalsoftwareinc.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-01-10 10:51:34 -07:00
Frank A. Cancio Bello d693b2883c docs: ftrace: Specifies when buffers get clear
Clarify a few places where the ring buffer and the "snapshot" buffer
are cleared as a side effect of an operation.

This will avoid users lost of tracing data because of these so far
undocumented behavior.

Signed-off-by: Frank A. Cancio Bello <frank@generalsoftwareinc.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20191218191553.q4lwyxmquvtjzjfz@frank-laptop
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-12-19 09:18:33 -07:00
Miroslav Benes 7162431dcf ftrace: Introduce PERMANENT ftrace_ops flag
Livepatch uses ftrace for redirection to new patched functions. It means
that if ftrace is disabled, all live patched functions are disabled as
well. Toggling global 'ftrace_enabled' sysctl thus affect it directly.
It is not a problem per se, because only administrator can set sysctl
values, but it still may be surprising.

Introduce PERMANENT ftrace_ops flag to amend this. If the
FTRACE_OPS_FL_PERMANENT is set on any ftrace ops, the tracing cannot be
disabled by disabling ftrace_enabled. Equally, a callback with the flag
set cannot be registered if ftrace_enabled is disabled.

Link: http://lkml.kernel.org/r/20191016113316.13415-2-mbenes@suse.cz

Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-04 09:33:15 -05:00
Peter Wu f12fcca653 docs: ftrace: clarify when tracing is disabled by the trace file
The current text could mislead the user into believing that only read()
disables tracing. Clarify that any open() call that requests read access
disables tracing.

Link: https://lkml.kernel.org/r/CAADnVQ+hU6QOC_dPmpjnuv=9g4SQEeaMEMqXOS2WpMj=q=LdiQ@mail.gmail.com
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-09-02 07:14:53 -06:00
Linus Torvalds d2d8b14604 The major changes in this tracing update includes:
- Removing of non-DYNAMIC_FTRACE from 32bit x86
 
  - Removing of mcount support from x86
 
  - Emulating a call from int3 on x86_64, fixes live kernel patching
 
  - Consolidated Tracing Error logs file
 
 Minor updates:
 
  - Removal of klp_check_compiler_support()
 
  - kdb ftrace dumping output changes
 
  - Accessing and creating ftrace instances from inside the kernel
 
  - Clean up of #define if macro
 
  - Introduction of TRACE_EVENT_NOP() to disable trace events based on config
    options
 
 And other minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r
 ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw=
 =RVQo
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major changes in this tracing update includes:

   - Removal of non-DYNAMIC_FTRACE from 32bit x86

   - Removal of mcount support from x86

   - Emulating a call from int3 on x86_64, fixes live kernel patching

   - Consolidated Tracing Error logs file

  Minor updates:

   - Removal of klp_check_compiler_support()

   - kdb ftrace dumping output changes

   - Accessing and creating ftrace instances from inside the kernel

   - Clean up of #define if macro

   - Introduction of TRACE_EVENT_NOP() to disable trace events based on
     config options

  And other minor fixes and clean ups"

* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
  x86: Hide the int3_emulate_call/jmp functions from UML
  livepatch: Remove klp_check_compiler_support()
  ftrace/x86: Remove mcount support
  ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
  tracing: Simplify "if" macro code
  tracing: Fix documentation about disabling options using trace_options
  tracing: Replace kzalloc with kcalloc
  tracing: Fix partial reading of trace event's id file
  tracing: Allow RCU to run between postponed startup tests
  tracing: Fix white space issues in parse_pred() function
  tracing: Eliminate const char[] auto variables
  ring-buffer: Fix mispelling of Calculate
  tracing: probeevent: Fix to make the type of $comm string
  tracing: probeevent: Do not accumulate on ret variable
  tracing: uprobes: Re-enable $comm support for uprobe events
  ftrace/x86_64: Emulate call function while updating in breakpoint handler
  x86_64: Allow breakpoints to emulate call instructions
  x86_64: Add gap to int3 to allow for call emulation
  tracing: kdb: Allow ftdump to skip all but the last few entries
  tracing: Add trace_total_entries() / trace_total_entries_cpu()
  ...
2019-05-15 16:05:47 -07:00
Mauro Carvalho Chehab 3df5ffd2e5 docs: trace: fix some Sphinx warnings
There are some warnings produced when building trace. Fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-04-16 09:08:32 -06:00
Tom Zanussi 26a9449179 tracing: Add tracing/error_log Documentation
Move most of the hist trigger extended error documentation to
ftrace.rst and expand on it to fully document tracing/error_log.

Link: http://lkml.kernel.org/r/c5d53c8f643ef6844d6ad8d0200c116936730b01.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:50 -04:00
Steven Rostedt (VMware) f79b3f3385 ftrace: Allow enabling of filters via index of available_filter_functions
Enabling of large number of functions by echoing in a large subset of the
functions in available_filter_functions can take a very long time. The
process requires testing all functions registered by the function tracer
(which is in the 10s of thousands), and doing a kallsyms lookup to convert
the ip address into a name, then comparing that name with the string passed
in.

When a function causes the function tracer to crash the system, a binary
bisect of the available_filter_functions can be done to find the culprit.
But this requires passing in half of the functions in
available_filter_functions over and over again, which makes it basically a
O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000
opertions! And enabling this can take over 20 minutes.

As a quick speed up, if a number is passed into one of the filter files,
instead of doing a search, it just enables the function at the corresponding
line of the available_filter_functions file. That is:

 # echo 50 > set_ftrace_filter
 # cat set_ftrace_filter
 x86_pmu_commit_txn

 # head -50 available_filter_functions | tail -1
 x86_pmu_commit_txn

This allows setting of half the available_filter_functions to take place in
less than a second!

 # time seq 20000 > set_ftrace_filter
 real    0m0.042s
 user    0m0.005s
 sys     0m0.015s

 # wc -l set_ftrace_filter
 20000 set_ftrace_filter

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-15 13:10:09 -05:00
Changbin Du 88d380eb06 tracing/doc: Add latency tracer funcgraph example
This add an example about how to use funcgraph with latency tracers.

Link: http://lkml.kernel.org/r/20190101154614.8887-6-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:19 -05:00
Amir Livneh 2a1e03ca33 doc: tracing: Fix a number of typos
Trivial fixes to spelling mistakes in ftrace.rst

v2: tripple -> triple

Signed-off-by: Amir Livneh <alivneh@fb.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-11-07 15:52:39 -07:00
Masami Hiramatsu 8a2933cfe7 docs: tracing: Add stacktrace filter command
Add a description of stacktrace filter command.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-08-31 16:34:49 -06:00
Masami Hiramatsu 1fee4f7752 doc: tracing: Fix a typo of trace_stat
The name of the directory for per-cpu function statistics
is trace_stat, not trace_stats.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-07-26 15:48:10 -06:00
Linus Torvalds 5eb6eed7e0 One new feature was added to ftrace, which is the trace_marker now supports
triggers. For example:
 
   # cd /sys/kernel/debug/tracing
   # echo 'snapshot' > events/ftrace/print/trigger
   # echo 'cause snapshot' > trace_marker
 
 The rest of the changes are various clean ups and also one stable fix that
 was added late in the cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCWxftjBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qosyAQCgmhb+/HgRYIbsKa8Q70E2pMx5A/za
 4rQ37Q9PhDUJlAD/aWXfrqcMm+buAU0yY4K+eb+f2feKXWXxdZ/fuh7+nA4=
 =lpXM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "One new feature was added to ftrace, which is the trace_marker now
  supports triggers. For example:

    # cd /sys/kernel/debug/tracing
    # echo 'snapshot' > events/ftrace/print/trigger
    # echo 'cause snapshot' > trace_marker

  The rest of the changes are various clean ups and also one stable fix
  that was added late in the cycle"

* tag 'trace-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (21 commits)
  tracing: Use match_string() instead of open coding it in trace_set_options()
  branch-check: fix long->int truncation when profiling branches
  ring-buffer: Fix typo in comment
  ring-buffer: Fix a bunch of typos in comments
  tracing/selftest: Add test to test simple snapshot trigger for trace_marker
  tracing/selftest: Add test to test hist trigger between kernel event and trace_marker
  tracing/selftest: Add selftests to test trace_marker histogram triggers
  ftrace/selftest: Fix reset_trigger() to handle triggers with filters
  ftrace/selftest: Have the reset_trigger code be a bit more careful
  tracing: Document trace_marker triggers
  tracing: Allow histogram triggers to access ftrace internal events
  tracing: Prevent further users of zero size static arrays in trace events
  tracing: Have zero size length in filter logic be full string
  tracing: Add trigger file for trace_markers tracefs/ftrace/print
  tracing: Do not show filter file for ftrace internal events
  tracing: Add brackets in ftrace event dynamic arrays
  tracing: Have event_trace_init() called by trace_init_tracefs()
  tracing: Add __find_event_file() to find event files without restrictions
  tracing: Do not reference event data in post call triggers
  tracepoints: Fix the descriptions of tracepoint_probe_register{_prio}
  ...
2018-06-06 16:39:18 -07:00
Linus Torvalds eeee3149aa There's been a fair amount of work in the docs tree this time around,
including:
 
  - Extensive RST conversions and organizational work in the
    memory-management docs thanks to Mike Rapoport.
 
  - An update of Documentation/features from Andrea Parri and a script to
    keep it updated.
 
  - Various LICENSES updates from Thomas, along with a script to check SPDX
    tags.
 
  - Work to fix dangling references to documentation files; this involved a
    fair number of one-liner comment changes outside of Documentation/
 
 ...and the usual list of documentation improvements, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbFTkKAAoJEI3ONVYwIuV6t24P/0K9qltHkLwsBo2fbGu/emem
 mb1QrZCFZGebKVrCIvET3YcT0q0xPW+ZldwMQYEUeCcu/vD3cGHGXlDbVJCa1fFD
 2OS10W/sEObPnREtlHO/zAzpapKP9DO1/f6NhO55iBJLGOCgoLL5xvSqgsI8MTGd
 vcJDXLitkh4CJEcfNLkQt8dEZzq9Tb6wdSFIvZBBXRNon2ItVN92D5xoQ0wtB+qt
 KmcGYofajK9bjtZpnC4iNg3i+zdwkd80bGTEN9f0hJTRZK5emCILk8fip8CMhRuB
 iwmcqb2RnMLydNLyK9RSs6OS5z3G4fYu9llRtLlZBAupcjRVpalWaBGxLOVO6jBG
 mvkqdKPMtxV4c7NvwKwFQL9dcjtxsxO4RDRYVWN82dS1L6WKKk8UvTuJUBLH0YA5
 af7ZKn7mJVhJ1cxPblaEBOBM3oQuk57LLkjmcpMOXyJ/IOkTIuV1Ezht+XzFyFQv
 VWSyekiKo+8D6WHACPTaWiizjW15e8CyP+WIhKzJyn7VQQrZwhsOS+R//ITsuvQ0
 vRdZ20lwUeBhR+mnXd5NfIo2w7G+OiqiREVAgxjgRrS0PnkzWG7lzzcSVU8HTfT4
 S7VXqval2a9Xg+N8aU2JUe49W858J8hKvIa98hBxGoZa84wxOGtEo7pIKhnMwMSe
 Uhkh/1/bQMxsK3fBEF74
 =I6FG
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.18' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "There's been a fair amount of work in the docs tree this time around,
  including:

   - Extensive RST conversions and organizational work in the
     memory-management docs thanks to Mike Rapoport.

   - An update of Documentation/features from Andrea Parri and a script
     to keep it updated.

   - Various LICENSES updates from Thomas, along with a script to check
     SPDX tags.

   - Work to fix dangling references to documentation files; this
     involved a fair number of one-liner comment changes outside of
     Documentation/

  ... and the usual list of documentation improvements, typo fixes, etc"

* tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits)
  Documentation: document hung_task_panic kernel parameter
  docs/admin-guide/mm: add high level concepts overview
  docs/vm: move ksm and transhuge from "user" to "internals" section.
  docs: Use the kerneldoc comments for memalloc_no*()
  doc: document scope NOFS, NOIO APIs
  docs: update kernel versions and dates in tables
  docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge
  docs/vm: transhuge: minor updates
  docs/vm: transhuge: change sections order
  Documentation: arm: clean up Marvell Berlin family info
  Documentation: gpio: driver: Fix a typo and some odd grammar
  docs: ranoops.rst: fix location of ramoops.txt
  scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode
  docs: uio-howto.rst: use a code block to solve a warning
  mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback
  w1: w1_io.c: fix a kernel-doc warning
  Documentation/process/posting: wrap text at 80 cols
  docs: admin-guide: add cgroup-v2 documentation
  Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'"
  Documentation: refcount-vs-atomic: Update reference to LKMM doc.
  ...
2018-06-04 12:34:27 -07:00
Steven Rostedt (VMware) d3439f9d6c tracing: Document trace_marker triggers
Add documentation and an example on how to use trace_marker triggers.

Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-29 08:29:30 -04:00
Thomas Gleixner a3ed0e4393 Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME
Revert commits

92af4dcb4e ("tracing: Unify the "boot" and "mono" tracing clocks")
127bfa5f43 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
7250a4047a ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
d6c7270e91 ("timekeeping: Remove boot time specific code")
f2d6fdbfd2 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
d6ed449afd ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
72199320d4 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")

As stated in the pull request for the unification of CLOCK_MONOTONIC and
CLOCK_BOOTTIME, it was clear that we might have to revert the change.

As reported by several folks systemd and other applications rely on the
documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
changes. After resume daemons time out and other timeout related issues are
observed. Rafael compiled this list:

* systemd kills daemons on resume, after >WatchdogSec seconds
  of suspending (Genki Sky).  [Verified that that's because systemd uses
  CLOCK_MONOTONIC and expects it to not include the suspend time.]

* systemd-journald misbehaves after resume:
  systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
corrupted or uncleanly shut down, renaming and replacing.
  (Mike Galbraith).

* NetworkManager reports "networking disabled" and networking is broken
  after resume 50% of the time (Pavel).  [May be because of systemd.]

* MATE desktop dims the display and starts the screensaver right after
  system resume (Pavel).

* Full system hang during resume (me).  [May be due to systemd or NM or both.]

That happens on debian and open suse systems.

It's sad, that these problems were neither catched in -next nor by those
folks who expressed interest in this change.

Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Reported-by: Genki Sky <sky@genki.is>,
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2018-04-26 14:53:32 +02:00
Steffen Maier 32fb7ef69a Documentation: ftrace: clarify filters with dynamic ftrace and graph
I fell into the trap of having set up function tracer with a very
limited filter and then switched over to function_graph and was
erroneously wondering why the latter did not trace what I expected,
which was the full unabridged graph recursion.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-04-16 14:04:26 -06:00
Linus Torvalds 2a56bb596b New features:
- Tom Zanussi's extended histogram work
    This adds the synthetic events to have histograms from multiple event data
    Adds triggers "onmatch" and "onmax" to call the synthetic events
    Several updates to the histogram code from this
 
  - Allow way to nest ring buffer calls in the same context
 
  - Allow absolute time stamps in ring buffer
 
  - Rewrite of filter code parsing based on Al Viro's suggestions
 
  - Setting of trace_clock to global if TSC is unstable (on boot)
 
  - Better OOM handling when allocating large ring buffers
 
  - Added initcall tracepoints (consolidated initcall_debug code with them)
 
 And other various fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iQHIBAABCgAyFiEEPm6V/WuN2kyArTUe1a05Y9njSUkFAlrLoCAUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQ1a05Y9njSUks/QwAn/ky8WgfjcRdjKmBYuEwDedvm9iI
 V9G5kpv5JMw5dLz4l1pS3tA3M9Lyuc5z3Shw92FTy36vdU1wxEjQgHa7viB1xk9x
 KsiTyNjTsgrRd7GVHMy/8Be2RRiTRLaXKAsLCoj/c7QWzagV1P8XWlWK5mojYkh/
 DrSXyg9Avkp30+sU1bvcLWnmmZUFqMxs+bWipD9uFc98USMMyeP25nrnhrj0gDTg
 Q93cjXUuyVRC4lJ2YTW0GCSKhMKEw5f/ltEOT1hwScqYkCJj1EubKqS53R/9h21z
 IPUrYcqLnMRu0j2ejR+UAy5Vsy3gJUrPMQb0F6hlu1DwbMd0d/9SGh1c+Sm+zorh
 yftWTdCZsYrXkaOuB6V5M30X+KBwbWO0Xc9VCvgJ/IU5vMlgLSt5itTWbT/Fmfhb
 ll5/RXP7zhSXRv5sdl/BP3/4dd6F8jpyKyaR2Rk2+XjBOGIq5mvqNGr4Vj9AzxW8
 E0nvq7l7e0dbxZNM42gEm3cht1VUg7Zz0Y0+
 =91oN
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New features:

   - Tom Zanussi's extended histogram work.

     This adds the synthetic events to have histograms from multiple
     event data Adds triggers "onmatch" and "onmax" to call the
     synthetic events Several updates to the histogram code from this

   - Allow way to nest ring buffer calls in the same context

   - Allow absolute time stamps in ring buffer

   - Rewrite of filter code parsing based on Al Viro's suggestions

   - Setting of trace_clock to global if TSC is unstable (on boot)

   - Better OOM handling when allocating large ring buffers

   - Added initcall tracepoints (consolidated initcall_debug code with
     them)

  And other various fixes and clean ups"

* tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
  init: Have initcall_debug still work without CONFIG_TRACEPOINTS
  init, tracing: Have printk come through the trace events for initcall_debug
  init, tracing: instrument security and console initcall trace events
  init, tracing: Add initcall trace events
  tracing: Add rcu dereference annotation for test func that touches filter->prog
  tracing: Add rcu dereference annotation for filter->prog
  tracing: Fixup logic inversion on setting trace_global_clock defaults
  tracing: Hide global trace clock from lockdep
  ring-buffer: Add set/clear_current_oom_origin() during allocations
  ring-buffer: Check if memory is available before allocation
  lockdep: Add print_irqtrace_events() to __warn
  vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK)
  tracing: Uninitialized variable in create_tracing_map_fields()
  tracing: Make sure variable string fields are NULL-terminated
  tracing: Add action comparisons when testing matching hist triggers
  tracing: Don't add flag strings when displaying variable references
  tracing: Fix display of hist trigger expressions containing timestamps
  ftrace: Drop a VLA in module_exists()
  tracing: Mention trace_clock=global when warning about unstable clocks
  tracing: Default to using trace_global_clock if sched_clock is unstable
  ...
2018-04-10 11:27:30 -07:00
Linus Torvalds 680014d6d1 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time(r) updates from Thomas Gleixner:
 "A small set of updates for timers and timekeeping:

   - The most interesting change is the consolidation of clock MONOTONIC
     and clock BOOTTIME.

     Clock MONOTONIC behaves now exactly like clock BOOTTIME and does
     not longer ignore the time spent in suspend. A new clock
     MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in
     kernels before this change. This allows applications to
     programmatically check for the clock MONOTONIC behaviour.

     As discussed in the review thread, this has the potential of
     breaking user space and we might have to revert this. Knock on wood
     that we can avoid that exercise.

   - Updates to the NTP mechanism to improve accuracy

   - A new kernel internal data structure to aid the ongoing Y2038 work.

   - Cleanups and simplifications of the clocksource code.

   - Make the alarmtimer code play nicely with debugobjects"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Init nanosleep alarm timer on stack
  y2038: Introduce struct __kernel_old_timeval
  tracing: Unify the "boot" and "mono" tracing clocks
  hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
  posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
  timekeeping: Remove boot time specific code
  Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
  timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
  timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
  timekeeping/ntp: Determine the multiplier directly from NTP tick length
  timekeeping/ntp: Don't align NTP frequency adjustments to ticks
  clocksource: Use ATTRIBUTE_GROUPS
  clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
  clocksource: Don't walk the clocksource list for empty override
2018-04-04 14:50:29 -07:00
Jonathan Corbet 6234c7bd8c docs: ftrace: fix a few formatting issues
Make sure that literal * characters are not interpreted as emphasis
markers.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-03-07 10:44:08 -07:00
Changbin Du 1f198e22bc trace doc: convert trace/ftrace.txt to rst format
This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-03-07 10:23:10 -07:00