linux-stable/net/ipv4
Jerry Chu bf5a755f5e net-gre-gro: Add GRE support to the GRO stack
This patch built on top of Commit 299603e837
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.

The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.

Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.

The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.

Note that commit 60769a5dcd "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.

I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.

All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)

An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.

The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).

1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%

1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%

1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%

1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%

The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).

2.1 ethtool -K gre1 gro on (hence will use gro_cells)

2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%

2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%

2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%

2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%

2.2 ethtool -K gre1 gro off

2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%

2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%

2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%

2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 16:21:31 -05:00
..
netfilter Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables 2014-01-06 13:29:30 -05:00
Kconfig net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
Makefile gre_offload: statically build GRE offloading support 2014-01-06 20:28:34 -05:00
af_inet.c net-gre-gro: Add GRE support to the GRO stack 2014-01-07 16:21:31 -05:00
ah4.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
arp.c ipv4: arp: update neighbour address when a gratuitous arp is received and arp_accept is set 2014-01-02 00:08:38 -05:00
cipso_ipv4.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
datagram.c net: Remove FLOWI_FLAG_CAN_SLEEP 2013-12-06 07:24:39 +01:00
devinet.c ipv4: loopback device: ignore value changes after device is upped 2014-01-07 15:55:17 -05:00
esp4.c net: esp{4,6}: get rid of struct esp_data 2013-10-29 06:39:42 +01:00
fib_frontend.c fib_trie: remove duplicated rcu lock 2013-10-18 13:53:59 -04:00
fib_lookup.h ipv4: make fib_detect_death static 2013-12-28 17:01:46 -05:00
fib_rules.c inet: fix NULL pointer Oops in fib(6)_rule_suppress 2013-12-10 17:54:23 -05:00
fib_semantics.c ipv4: make fib_detect_death static 2013-12-28 17:01:46 -05:00
fib_trie.c seq_file: remove "%n" usage from seq_file users 2013-11-15 09:32:20 +09:00
gre_demux.c gre_offload: statically build GRE offloading support 2014-01-06 20:28:34 -05:00
gre_offload.c net-gre-gro: Add GRE support to the GRO stack 2014-01-07 16:21:31 -05:00
icmp.c ipv4: new ip_no_pmtu_disc mode to always discard incoming frag needed msgs 2013-12-18 16:58:20 -05:00
igmp.c ipv4: fix all space errors in file igmp.c 2013-12-26 13:43:21 -05:00
inet_connection_sock.c inet: rename ir_loc_port to ir_num 2013-10-10 14:37:35 -04:00
inet_diag.c net: inet_diag: zero out uninitialized idiag_{src,dst} fields 2013-12-19 14:55:52 -05:00
inet_fragment.c inet: remove old fragmentation hash initializing 2013-10-23 17:01:41 -04:00
inet_hashtables.c inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once 2013-10-19 19:45:35 -04:00
inet_lro.c lro: remove dead code 2013-12-29 16:34:25 -05:00
inet_timewait_sock.c tcp/dccp: remove twchain 2013-10-08 23:19:24 -04:00
inetpeer.c ipv4: remove unused function 2013-12-28 17:03:20 -05:00
ip_forward.c
ip_fragment.c net: Add utility functions to clear rxhash 2013-12-17 16:36:21 -05:00
ip_gre.c ip_gre: fix msg_name parsing for recvfrom/recvmsg 2013-12-18 17:44:33 -05:00
ip_input.c net: add SNMP counters tracking incoming ECN bits 2013-08-08 22:24:59 -07:00
ip_options.c ipv4: switch and case should be at the same indent 2014-01-02 03:30:36 -05:00
ip_output.c ipv4: consistent reporting of pmtu data in case of corking 2013-12-22 18:52:09 -05:00
ip_sockglue.c net: more spelling fixes 2013-12-10 21:57:11 -05:00
ip_tunnel.c net: unify the pcpu_tstats and br_cpu_netstats as one 2014-01-04 20:10:24 -05:00
ip_tunnel_core.c net: Add utility functions to clear rxhash 2013-12-17 16:36:21 -05:00
ip_vti.c net: unify the pcpu_tstats and br_cpu_netstats as one 2014-01-04 20:10:24 -05:00
ipcomp.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
ipconfig.c ipconfig: add informative timeout messages while waiting for carrier 2013-04-02 14:35:33 -04:00
ipip.c ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
ipmr.c neigh: restore old behaviour of default parms values 2013-12-09 20:56:12 -05:00
netfilter.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
ping.c ipv4: ping make local stuff static 2013-12-28 17:05:45 -05:00
proc.c ipv4: spaces required around that '=' 2014-01-02 03:30:36 -05:00
protocol.c net: remove outdated comment for ipv4 and ipv6 protocol handler 2013-11-28 18:47:51 -05:00
raw.c net: Remove FLOWI_FLAG_CAN_SLEEP 2013-12-06 07:24:39 +01:00
route.c ipv4: fix race in concurrent ip_route_input_slow() 2013-11-20 15:28:44 -05:00
syncookies.c ipv4: fix checkpatch error "space prohibited" 2013-12-26 13:43:21 -05:00
sysctl_net_ipv4.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
tcp.c tcp: out_of_order_queue do not use its lock 2014-01-06 16:34:34 -05:00
tcp_bic.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_cong.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_cubic.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_diag.c
tcp_fastopen.c tcp: enable sockets to use MSG_FASTOPEN by default 2013-11-04 19:57:47 -05:00
tcp_highspeed.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_htcp.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_hybla.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_illinois.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_input.c tcp: make local functions static 2013-12-29 16:34:24 -05:00
tcp_ipv4.c ipv4: fix checkpatch error with foo * bar 2013-12-26 13:43:21 -05:00
tcp_lp.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_memcontrol.c tcp_memcontrol: Cleanup/fix cg_proto->memory_pressure handling. 2013-12-05 21:01:01 -05:00
tcp_metrics.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
tcp_minisocks.c tcp: out_of_order_queue do not use its lock 2014-01-06 16:34:34 -05:00
tcp_offload.c net-gre-gro: Add GRE support to the GRO stack 2014-01-07 16:21:31 -05:00
tcp_output.c tcp: make local functions static 2013-12-29 16:34:24 -05:00
tcp_probe.c ipv4: ERROR: do not initialise globals to 0 or NULL 2013-12-26 13:43:21 -05:00
tcp_scalable.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_timer.c tcp: temporarily disable Fast Open on SYN timeout 2013-10-29 22:50:41 -04:00
tcp_vegas.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_vegas.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
tcp_veno.c tcp: properly handle stretch acks in slow start 2013-11-04 19:57:59 -05:00
tcp_westwood.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_yeah.c ipv4: ipv4: Cleanup the comments in tcp_yeah.c 2013-12-26 13:43:55 -05:00
tunnel4.c
udp.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-06 17:37:45 -05:00
udp_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
udp_impl.h net: ipv4/ipv6: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
udp_offload.c ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC 2014-01-02 19:06:47 -05:00
udplite.c
xfrm4_input.c
xfrm4_mode_beet.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2013-09-30 15:24:57 -04:00
xfrm4_output.c xfrm: revert ipv4 mtu determination to dst_mtu 2013-08-26 12:40:53 +02:00
xfrm4_policy.c xfrm: Fix null pointer dereference when decoding sessions 2013-11-01 07:08:46 +01:00
xfrm4_state.c inet: make no_pmtu_disc per namespace and kill ipv4_config 2013-12-18 16:58:20 -05:00
xfrm4_tunnel.c sit: add IPv4 over IPv4 support 2013-05-31 17:19:05 -07:00