Commit Graph

27 Commits

Author SHA1 Message Date
Ping Gan 15fcdf6ae1 tcp: Add tracepoint for tcp_set_ca_state
The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense to
add tracepoint for congestion status set function to summate cc
status duration and evaluate the performance of network
and congestion algorithm. the backgound of this patch is below.

Link: https://github.com/iovisor/bcc/pull/3899

Signed-off-by: Ping Gan <jacky_gam_2001@163.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 20:33:15 -07:00
Eric Dumazet 4057037535 tcp: add accessors to read/set tp->snd_cwnd
We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321ff7 ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06 12:05:41 -07:00
Jakub Kicinski 709c031423 tcp: add tracepoint for checksum errors
Add a tracepoint for capturing TCP segments with
a bad checksum. This makes it easy to identify
sources of bad frames in the fleet (e.g. machines
with faulty NICs).

It should also help tools like IOvisor's tcpdrop.py
which are used today to get detailed information
about such packets.

We don't have a socket in many cases so we must
open code the address extraction based just on
the skb.

v2: add missing export for ipv6=m

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:26:03 -07:00
Hariharan Ananthakrishnan 3dd344ea84 net: tracepoint: exposing sk_family in all tcp:tracepoints
Similar to sock:inet_sock_set_state tracepoint, expose sk_family to
distinguish AF_INET and AF_INET6 families.

The following tcp tracepoints are updated:
tcp:tcp_destroy_sock
tcp:tcp_rcv_space_adjust
tcp:tcp_retransmit_skb
tcp:tcp_send_reset
tcp:tcp_receive_reset
tcp:tcp_retransmit_synack
tcp:tcp_probe

Signed-off-by: Hariharan Ananthakrishnan <hari@netflix.com>
Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Link: https://lore.kernel.org/r/20210129001210.344438-1-hari@netflix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 09:25:36 -08:00
Tony Lu dd3d792def tcp: remove redundant new line from tcp_event_sk_skb
This removes '\n' from trace event class tcp_event_sk_skb to avoid
redundant new blank line and make output compact.

Fixes: af4325ecc2 ("tcp: expose sk_state in tcp_retransmit_skb tracepoint")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09 19:41:50 -08:00
Yafang Shao af4325ecc2 tcp: expose sk_state in tcp_retransmit_skb tracepoint
After sk_state exposed, we can get in which state this retransmission
occurs. That could give us more detail for dignostic.
For example, if this retransmission occurs in SYN_SENT state, it may
also indicates that the syn packet may be dropped on the remote peer due
to syn backlog queue full and then we could check the remote peer.

BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:07:19 -07:00
Yafang Shao 3d97d88e80 tcp: minor optimization around tcp_hdr() usage in receive path
This is additional to the
commit ea1627c20c ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet. So use the less expensive one to get the tcp header.

Remove the third parameter of tcp_rcv_established() and put it into
the function body.

Furthermore, the local variables are listed as a reverse christmas tree :)

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 13:20:47 -04:00
Yafang Shao 2d68c0745a tcp: use data length instead of skb->len in tcp_probe
skb->len is meaningless to user.
data length could be more helpful, with which we can easily filter out
the packet without payload.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:15:37 -04:00
Yafang Shao 6163849d28 net: introduce a new tracepoint for tcp_rcv_space_adjust
tcp_rcv_space_adjust is called every time data is copied to user space,
introducing a tcp tracepoint for which could show us when the packet is
copied to user.

When a tcp packet arrives, tcp_rcv_established() will be called and with
the existed tracepoint tcp_probe we could get the time when this packet
arrives.
Then this packet will be copied to user, and tcp_rcv_space_adjust will
be called and with this new introduced tracepoint we could get the time
when this packet is copied to user.
With these two tracepoints, we could figure out whether the user program
processes this packet immediately or there's latency.

Hence in the printk message, sk_cookie is printed as a key to relate
tcp_rcv_space_adjust with tcp_probe.

Maybe we could export sockfd in this new tracepoint as well, then we
could relate this new tracepoint with epoll/read/recv* tracepoints, and
finally that could show us the whole lifespan of this packet. But we
could also implement that with pid as these functions are executed in
process context.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23 09:58:18 -04:00
Andrey Ignatov ef53e9e147 net: Remove unused tcp_set_state tracepoint
This tracepoint was replaced by inet_sock_set_state in 563e0bb and not
used anywhere in the kernel anymore. Remove it.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-16 19:02:15 -04:00
Masami Hiramatsu ee549be6f0 net: dccp: Add DCCP sendmsg trace event
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:30 -05:00
Masami Hiramatsu c3fde1bd28 net: tcp: Add trace events for TCP congestion window tracing
This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.

User can use ftrace to trace this event as below;

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/tcp/tcp_probe/enable
  (run workloads)
  # cat trace

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:29 -05:00
David S. Miller 6bb8824732 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/ipv6/ip6_gre.c is a case of parallel adds.

include/trace/events/tcp.h is a little bit more tricky.  The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29 15:42:26 -05:00
Mat Martineau 6a6b0b9914 tcp: Avoid preprocessor directives in tracepoint macro args
Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
of errors:

./include/trace/events/tcp.h:68:1: error: directive in argument list
./include/trace/events/tcp.h:75:1: error: directive in argument list
./include/trace/events/tcp.h:144:1: error: directive in argument list
./include/trace/events/tcp.h:151:1: error: directive in argument list
./include/trace/events/tcp.h:216:1: error: directive in argument list
./include/trace/events/tcp.h:223:1: error: directive in argument list
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list

Once sparse finds an error, it stops printing warnings for the file it
is checking. This masks any sparse warnings that would normally be
reported for the core TCP code.

Instead, handle the preprocessor conditionals in a couple of auxiliary
macros. This also has the benefit of reducing duplicate code.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-26 17:25:22 -05:00
Yafang Shao 563e0bb0dc net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint
As sk_state is a common field for struct sock, so the state
transition tracepoint should not be a TCP specific feature.
Currently it traces all AF_INET state transition, so I rename this
tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
into trace/events/sock.h.
We dont need to create a file named trace/events/inet_sock.h for this one single
tracepoint.

Two helpers are introduced to trace sk_state transition
    - void inet_sk_state_store(struct sock *sk, int newstate);
    - void inet_sk_set_state(struct sock *sk, int state);
As trace header should not be included in other header files,
so they are defined in sock.c.

The protocol such as SCTP maybe compiled as a ko, hence export
inet_sk_set_state().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20 14:00:25 -05:00
Steven Rostedt (VMware) d7b850a7de tcp: Export to userspace the TCP state names for the trace events
The TCP trace events (specifically tcp_set_state), maps emums to symbol
names via __print_symbolic(). But this only works for reading trace events
from the tracefs trace files. If perf or trace-cmd were to record these
events, the event format file does not convert the enum names into numbers,
and you get something like:

__print_symbolic(REC->oldstate,
    { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
    { TCP_SYN_SENT, "TCP_SYN_SENT" },
    { TCP_SYN_RECV, "TCP_SYN_RECV" },
    { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
    { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
    { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
    { TCP_CLOSE, "TCP_CLOSE" },
    { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
    { TCP_LAST_ACK, "TCP_LAST_ACK" },
    { TCP_LISTEN, "TCP_LISTEN" },
    { TCP_CLOSING, "TCP_CLOSING" },
    { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })

Where trace-cmd and perf do not know the values of those enums.

Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
the enum strings into their values at system boot. This will allow perf and
trace-cmd to see actual numbers and not enums:

__print_symbolic(REC->oldstate,
    { 1, "TCP_ESTABLISHED" },
    { 2, "TCP_SYN_SENT" },
    { 3, "TCP_SYN_RECV" },
    { 4, "TCP_FIN_WAIT1" },
    { 5, "TCP_FIN_WAIT2" },
    { 6, "TCP_TIME_WAIT" },
    { 7, "TCP_CLOSE" },
    { 8, "TCP_CLOSE_WAIT" },
    { 9, "TCP_LAST_ACK" },
    { 10, "TCP_LISTEN" },
    { 11, "TCP_CLOSING" },
    { 12, "TCP_NEW_SYN_RECV" })

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20 14:00:24 -05:00
Song Liu cf34ce3da1 tcp: add tracepoint trace_tcp_retransmit_synack()
This tracepoint can be used to trace synack retransmits. It maintains
pointer to struct request_sock.

We cannot simply reuse trace_tcp_retransmit_skb() here, because the
sk here is the LISTEN socket. The IP addresses and ports should be
extracted from struct request_sock.

Note that, like many other tracepoints, this patch uses IS_ENABLED
in TP_fast_assign macro, which triggers sparse warning like:

./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list

However, there is no good solution to avoid these warnings. To the
best of our knowledge, these warnings are harmless.

Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 10:12:45 +09:00
Song Liu e8fce23946 tcp: add tracepoint trace_tcp_set_state()
This patch adds tracepoint trace_tcp_set_state. Besides usual fields
(s/d ports, IP addresses), old and new state of the socket is also
printed with TP_printk, with __print_symbolic().

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:21:25 +01:00
Song Liu e1a4aa50f4 tcp: add tracepoint trace_tcp_destroy_sock
This patch adds trace event trace_tcp_destroy_sock.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:21:25 +01:00
Song Liu 5941521c05 tcp: add tracepoint trace_tcp_receive_reset
New tracepoint trace_tcp_receive_reset is added and called from
tcp_reset(). This tracepoint is define with a new class tcp_event_sk.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:21:25 +01:00
Song Liu c24b14c46b tcp: add tracepoint trace_tcp_send_reset
New tracepoint trace_tcp_send_reset is added and called from
tcp_v4_send_reset(), tcp_v6_send_reset() and tcp_send_active_reset().

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:21:25 +01:00
Song Liu 7344e29f28 tcp: mark trace event arguments sk and skb as const
Some functions that we plan to add trace points require const sk
and/or skb. So we mark these fields as const in the tracepoint.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:21:25 +01:00
Song Liu f6e37b2541 tcp: add trace event class tcp_event_sk_skb
Introduce event class tcp_event_sk_skb for tcp tracepoints that
have arguments sk and skb.

Existing tracepoint trace_tcp_retransmit_skb() falls into this class.
This patch rewrites the definition of trace_tcp_retransmit_skb() with
tcp_event_sk_skb.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:21:25 +01:00
David Ahern 890056783c tcp: Remove use of inet6_sk and add IPv6 checks to tracepoint
386fd5da40 ("tcp: Check daddr_cache before use in tracepoint") was the
second version of the tracepoint fixup patch. This patch is the delta
between v2 and v3.  Specifically, remove the use of inet6_sk and check
sk_family as requested by Eric and add IS_ENABLED(CONFIG_IPV6) around
the use of sk_v6_rcv_saddr and sk_v6_daddr as done in sock_common (noted
by Cong).

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:04:58 +01:00
David Ahern 386fd5da40 tcp: Check daddr_cache before use in tracepoint
Running perf in one window to capture tcp_retransmit_skb tracepoint:
    $ perf record -e tcp:tcp_retransmit_skb -a

And causing a retransmission on an active TCP session (e.g., dropping
packets in the receiver, changing MTU on the interface to 500 and back
to 1500) triggers a panic:

[   58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.546770] PGD 0 P4D 0
[   58.547472] Oops: 0000 [#1] SMP
[   58.548328] Modules linked in: vrf
[   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
[   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
[   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
[   58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
[   58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
[   58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
[   58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
[   58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
[   58.563845] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   58.564873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
[   58.566538] Call Trace:
[   58.566865]  <IRQ>
[   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
[   58.567704]  ? tcp_set_ca_state+0x22/0x3f
[   58.568231]  tcp_retransmit_skb+0x14/0xa3
[   58.568754]  tcp_retransmit_timer+0x472/0x5e3
[   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
[   58.569946]  tcp_write_timer_handler+0x95/0x1e9
[   58.570548]  tcp_write_timer+0x2a/0x58

Check that daddr_cache is non-NULL before de-referencing.

Fixes: e086101b15 ("tcp: add a tracepoint for tcp retransmission")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18 14:15:14 +01:00
David Ahern fb6ff75e18 tcp: Use pI6c in tcp tracepoint
The compact form for IPv6 addresses is more user friendly than the full
version. For example:
   compact: 2001:db8:1::1
      full: 2001:0db8:0001:0000:0000:0000:0000:0004i

Update the tcp tracepoint to show the compact form.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18 14:10:29 +01:00
Cong Wang e086101b15 tcp: add a tracepoint for tcp retransmission
We need a real-time notification for tcp retransmission
for monitoring.

Of course we could use ftrace to dynamically instrument this
kernel function too, however we can't retrieve the connection
information at the same time, for example perf-tools [1] reads
/proc/net/tcp for socket details, which is slow when we have
a lots of connections.

Therefore, this patch adds a tracepoint for __tcp_retransmit_skb()
and exposes src/dst IP addresses and ports of the connection.
This also makes it easier to integrate into perf.

Note, I expose both IPv4 and IPv6 addresses at the same time:
for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
Also, add sk and skb pointers as they are useful for BPF.

1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans

Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14 18:45:15 -07:00