Commit Graph

165 Commits

Author SHA1 Message Date
Hangbin Liu 52d0d404d3 geneve: add ttl inherit support
Similar with commit 72f6d71e49 ("vxlan: add ttl inherit support"),
currently ttl == 0 means "use whatever default value" on geneve instead
of inherit inner ttl. To respect compatibility with old behavior, let's
add a new IFLA_GENEVE_TTL_INHERIT for geneve ttl inherit support.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:38:22 -07:00
David S. Miller 5cd3da4ba2 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Simple overlapping changes in stmmac driver.

Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-03 10:29:26 +09:00
Sabrina Dubroca 603d4cf8fe net: fix use-after-free in GRO with ESP
Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.

Commit 5f114163f2 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.

This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.

Fixes: 5f114163f2 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02 20:34:04 +09:00
Pieter Jansen van Vuuren 256c87c17c net: check tunnel option type in tunnel flags
Check the tunnel option type stored in tunnel flags when creating options
for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
options on interfaces that are not associated with them.

Make sure all users of the infrastructure set correct flags, for the BPF
helper we have to set all bits to keep backward compatibility.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
David Miller d4546c2509 net: Convert GRO SKB handling to list_head.
Manage pending per-NAPI GRO packets via list_head.

Return an SKB pointer from the GRO receive handlers.  When GRO receive
handlers return non-NULL, it means that this SKB needs to be completed
at this time and removed from the NAPI queue.

Several operations are greatly simplified by this transformation,
especially timing out the oldest SKB in the list when gro_count
exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
in reverse order.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 11:33:04 +09:00
Alexey Kodanev c40e89fd35 geneve: configure MTU based on a lower device
Currently, on a new link creation or when 'remote' address parameter
is updated, an MTU is not changed and always equals 1500. When a lower
device has a larger MTU, it might not be efficient, e.g. for UDP, and
requires the manual MTU adjustments to match the MTU of the lower
device.

This patch tries to automate this process, finds a lower device using
the 'remote' address parameter, then uses its MTU to tune GENEVE's MTU:
  * on a new link creation
  * when 'remote' parameter is changed

Also with this patch, the MTU from a user, on a new link creation, is
passed to geneve_change_mtu() where it is verified, and MTU adjustments
with a lower device is skipped in that case. Prior that change, it was
possible to set the invalid MTU values on a new link creation.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20 11:20:05 -04:00
Alexey Kodanev 321acc1c68 geneve: check MTU for a minimum in geneve_change_mtu()
geneve_change_mtu() will be used not only as ndo_change_mtu() callback,
but also to verify a user specified MTU on a new link creation in the
next patch.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20 11:20:05 -04:00
Alexey Kodanev 5edbea6987 geneve: cleanup hard coded value for Ethernet header length
Use ETH_HLEN instead and introduce two new macros: GENEVE_IPV4_HLEN
and GENEVE_IPV6_HLEN that include Ethernet header length, corresponded
IP header length and GENEVE_BASE_HLEN.

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20 11:20:05 -04:00
Alexey Kodanev 4c52a889ab geneve: remove white-space before '#if IS_ENABLED(CONFIG_IPV6)'
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-20 11:20:05 -04:00
David S. Miller 3e3ab9ccca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 10:15:51 -05:00
Nicolas Dichtel f15ca723c1 net: don't call update_pmtu unconditionally
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at           (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d51f ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff44 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 16:27:34 -05:00
David S. Miller a0ce093180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-09 10:37:00 -05:00
Xin Long 52a589d51f geneve: update skb dst pmtu on tx path
Commit a93bf0ff44 ("vxlan: update skb dst pmtu on tx path") has fixed
a performance issue caused by the change of lower dev's mtu for vxlan.

The same thing needs to be done for geneve as well.

Note that geneve cannot adjust it's mtu according to lower dev's mtu
when creating it. The performance is very low later when netperfing
over it without fixing the mtu manually. This patch could also avoid
this issue.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:34:59 -05:00
Haishuang Yan 2843a25348 geneve: speedup geneve tunnels dismantle
Since we now hold RTNL lock in geneve_exit_net, it's better batch them
to speedup geneve tunnel dismantle.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 10:59:44 -05:00
Hangbin Liu f9094b7603 geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't
make sense if we haven't enabled CONFIG_IPV6. Fix it by adding
if IS_ENABLED(CONFIG_IPV6) check.

Fixes: abe492b4f5 ("geneve: UDP checksum configuration via netlink")
Fixes: fd7eafd021 ("geneve: fix fill_info when link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24 03:29:22 +09:00
Hangbin Liu fd7eafd021 geneve: fix fill_info when link down
geneve->sock4/6 were added with geneve_open and released with geneve_stop.
So when geneve link down, we will not able to show remote address and
checksum info after commit 11387fe4a9 ("geneve: fix fill_info when using
collect_metadata").

Fix this by avoid passing *_REMOTE{,6} for COLLECT_METADATA since they are
mutually exclusive, and always show UDP_ZERO_CSUM6_RX info.

Fixes: 11387fe4a9 ("geneve: fix fill_info when using collect_metadata")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 19:47:11 +09:00
Vasily Averin ab384b63c7 geneve: exit_net cleanup check added
Be sure that sock_list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 15:45:52 +09:00
David S. Miller f8ddadc4db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
There were quite a few overlapping sets of changes here.

Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.

Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly.  If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.

In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().

Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.

The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 13:39:14 +01:00
Stefano Brivio 3fa5f11de1 geneve: Get rid of is_all_zero(), streamline is_tnl_info_zero()
No need to re-invent memchr_inv() with !is_all_zero(). While at
it, replace conditional and return clauses with a single return
clause in is_tnl_info_zero().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:42:39 +01:00
Stefano Brivio 772e97b57a geneve: Fix function matching VNI and tunnel ID on big-endian
On big-endian machines, functions converting between tunnel ID
and VNI use the three LSBs of tunnel ID storage to map VNI.

The comparison function eq_tun_id_and_vni(), on the other hand,
attempted to map the VNI from the three MSBs. Fix it by using
the same check implemented on LE, which maps VNI from the three
LSBs of tunnel ID.

Fixes: 2e0b26e103 ("geneve: Optimize geneve device lookup.")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 02:50:42 +01:00
Girish Moodalbail c5ebc4409f geneve: use netlink_ext_ack for error reporting in rtnl operations
Add extack error messages for failure paths while creating/modifying
geneve devices. Once extack support is added to iproute2, more
meaningful and helpful error messages will be displayed making it easy
for users to discern what went wrong.

Before:

=======
$ ip link add gen1 address 0:1:2:3:4:5:6 type geneve id 200 \
  remote 192.168.13.2
RTNETLINK answers: Invalid argument

After:
======
$ ip link add gen1 address 0:1:2:3:4:5:6 type geneve id 200 \
  remote 192.168.13.2
Error: Provided link layer address is not Ethernet

Also, netdev_dbg() calls used to log errors associated with Netlink
request have been removed.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-11 13:45:02 -07:00
David S. Miller 3b2b69efec Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mainline had UFO fixes, but UFO is removed in net-next so we
take the HEAD hunks.

Minor context conflict in bcmsysport statistics bug fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10 12:11:16 -07:00
Girish Moodalbail 04db70d9fe geneve: maximum value of VNI cannot be used
Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
of values for it would be from 0 to 16777215 (2^24 -1).  However, one
cannot create a geneve device with VNI set to 16777215. This patch fixes
this issue.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09 22:41:04 -07:00
Sabrina Dubroca 04584957b5 geneve/vxlan: offload ports on register/unregister events
This improves consistency of handling when moving a netdev to another
netns. Most drivers currently do a full reset when the device goes up,
so that will flush the offload state anyway.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:52:59 -07:00
Sabrina Dubroca 2d2b13fcff geneve/vxlan: add support for NETDEV_UDP_TUNNEL_DROP_INFO
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:52:59 -07:00
Girish Moodalbail 5b861f6baa geneve: add rtnl changelink support
This patch adds changelink rtnl operation support for geneve devices
and the code changes involve:

  - added geneve_quiesce() which quiesces the geneve device data path
    for both TX and RX. This lets us perform the changelink operation
    atomically w.r.t data path. Also added geneve_unquiesce() to
    reverse the operation of geneve_quiesce().

  - refactor geneve_newlink into geneve_nl2info to be used by both
    geneve_newlink and geneve_changelink

  - geneve_nl2info takes a changelink boolean argument to isolate
    changelink checks.

  - Allow changing only a few attributes (ttl, tos, and remote tunnel
    endpoint IP address (within the same address family)):
    - return -EOPNOTSUPP for attributes that cannot be changed for
      now. Incremental patches can make the non-supported one
      available in the future if needed.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 13:51:05 -07:00
Jiri Benc 4b4c21fad6 geneve: fix hlist corruption
It's not a good idea to add the same hlist_node to two different hash lists.
This leads to various hard to debug memory corruptions.

Fixes: 8ed66f0e82 ("geneve: implement support for IPv6-based tunnels")
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:36:27 -07:00
Matthias Schiffer a8b8a889e3 net: add netlink_ext_ack argument to rtnl_link_ops.validate
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:22 -04:00
Matthias Schiffer 7a3f4a1851 net: add netlink_ext_ack argument to rtnl_link_ops.newlink
Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26 23:13:21 -04:00
Johannes Berg d58ff35122 networking: make skb_push & __skb_push return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:40 -04:00
David S. Miller 0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Girish Moodalbail fe741e2362 geneve: add missing rx stats accounting
There are few places on the receive path where packet drops and packet
errors were not accounted for. This patch fixes that issue.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09 12:49:57 -04:00
David S. Miller cf124db566 net: Fix inconsistent teardown and release of private netdev state.
Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:53:24 -04:00
Eric Garver 9a1c44d989 geneve: fix needed_headroom and max_mtu for collect_metadata
Since commit 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
when using COLLECT_METADATA geneve devices are created with too small of
a needed_headroom and too large of a max_mtu. This is because
ip_tunnel_info_af() is not valid with the device level info when using
COLLECT_METADATA and we mistakenly fall into the IPv4 case.

For COLLECT_METADATA, always use the worst case of ipv6 since both
sockets are created.

Fixes: 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 20:03:09 -04:00
Eric Garver 11387fe4a9 geneve: fix fill_info when using collect_metadata
Since 9b4437a5b8 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.

Fix by checking for the presence of the actual sockets.

Fixes: 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 12:51:16 -04:00
Girish Moodalbail 5e0740c445 geneve: fix incorrect setting of UDP checksum flag
Creating a geneve link with 'udpcsum' set results in a creation of link
for which UDP checksum will NOT be computed on outbound packets, as can
be seen below.

11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0
    geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum

Similarly, creating a link with 'noudpcsum' set results in a creation
of link for which UDP checksum will be computed on outbound packets.

Fixes: 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30 22:31:16 -04:00
Jakub Kicinski a717e3f740 geneve: lock RCU on TX path
There is no guarantees that callers of the TX path will hold
the RCU lock.  Grab it explicitly.

Fixes: fceb9c3e38 ("geneve: avoid using stale geneve socket.")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01 09:58:31 -08:00
Haishuang Yan 31ac1c1945 geneve: fix ip_hdr_len reserved for geneve6 tunnel.
It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.

Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:14:49 -05:00
pravin shelar 2e0b26e103 geneve: Optimize geneve device lookup.
Rather than comparing 64-bit tunnel-id, compare tunnel vni
which is 24-bit id. This also save conversion from vni
to tunnel id on each tunnel packet receive.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 14:05:50 -05:00
pravin shelar bcceeec3cc geneve: Remove redundant socket checks.
Geneve already has check for device socket in route
lookup function. So no need to check it in xmit
function.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 14:05:49 -05:00
pravin shelar c3ef5aa5e5 geneve: Merge ipv4 and ipv6 geneve_build_skb()
There are minimal difference in building Geneve header
between ipv4 and ipv6 geneve tunnels. Following patch
refactors code to unify it.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 14:05:49 -05:00
pravin shelar 9b4437a5b8 geneve: Unify LWT and netdev handling.
Current geneve implementation has two separate cases to handle.
1. netdev xmit
2. LWT xmit.

In case of netdev, geneve configuration is stored in various
struct geneve_dev members. For example geneve_addr, ttl, tos,
label, flags, dst_cache, etc. For LWT ip_tunnel_info is passed
to the device in ip_tunnel_info.

Following patch uses ip_tunnel_info struct to store almost all
of configuration of a geneve netdevice. This allows us to unify
most of geneve driver code around ip_tunnel_info struct.
This dramatically simplify geneve code, since it does not
need to handle two different configuration cases. Removes
duplicate code, single code path can handle either type
of geneve devices.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 14:05:49 -05:00
Alexey Dobriyan c7d03a00b5 netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 10:59:15 -05:00
David S. Miller 27058af401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple overlapping changes.

For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 12:42:58 -04:00
pravin shelar fceb9c3e38 geneve: avoid using stale geneve socket.
This patch is similar to earlier vxlan patch.
Geneve device close operation frees geneve socket. This
operation can race with geneve-xmit function which
dereferences geneve socket. Following patch uses RCU
mechanism to avoid this situation.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 20:56:31 -04:00
Jarod Wilson 91572088e3 net: use core MTU range checking in core net infra
geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
  closer inspection and testing

macvlan:
- set min/max_mtu

tun:
- set min/max_mtu, remove tun_net_change_mtu

vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function
- This one is also not as straight-forward and could use closer inspection
  and testing from vxlan folks

bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
  is the largest possible size supported

sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

macsec:
- min_mtu = 0, max_mtu = 65535

macvlan:
- min_mtu = 0, max_mtu = 65535

ntb_netdev:
- min_mtu = 0, max_mtu = 65535

veth:
- min_mtu = 68, max_mtu = 65535

8021q:
- min_mtu = 0, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Tom Herbert <tom@herbertland.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jiri Benc <jbenc@redhat.com>
CC: WANG Cong <xiyou.wangcong@gmail.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Pravin B Shelar <pshelar@ovn.org>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Sabrina Dubroca fcd91dd449 net: add recursion limit to GRO
Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c2 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:32:22 -04:00
Sabrina Dubroca e5de25dce9 drivers/net: fixup comments after "Future-proof tunnel offload handlers"
Some comments weren't updated to reflect the renaming of ndo's and the
change of arguments.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:42:11 -07:00
David S. Miller 30d0844bdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/en.h
	drivers/net/ethernet/mellanox/mlx5/core/en_main.c
	drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 10:35:22 -07:00
Haishuang Yan d5d5e8d557 geneve: fix max_mtu setting
For ipv6+udp+geneve encapsulation data, the max_mtu should subtract
sizeof(ipv6hdr), instead of sizeof(iphdr).

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 14:50:22 -07:00
David S. Miller ee58b57100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:03:36 -04:00
Haishuang Yan efeb2267bb geneve: fix tx_errors statistics
Tx errors present summation of errors encountered while transmitting
packets.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-23 15:16:12 -04:00
Alexander Duyck 7c46a640de net: Merge VXLAN and GENEVE push notifiers into a single notifier
This patch merges the notifiers for VXLAN and GENEVE into a single UDP
tunnel notifier.  The idea is that we will want to only have to make one
notifier call to receive the list of ports for VXLAN and GENEVE tunnels
that need to be offloaded.

In addition we add a new set of ndo functions named ndo_udp_tunnel_add and
ndo_udp_tunnel_del that are meant to allow us to track the tunnel meta-data
such as port and address family as tunnels are added and removed.  The
tunnel meta-data is now transported in a structure named udp_tunnel_info
which for now carries the type, address family, and port number.  In the
future this could be updated so that we can include a tuple of values
including things such as the destination IP address and other fields.

I also ended up going with a naming scheme that consisted of using the
prefix udp_tunnel on function names.  I applied this to the notifier and
ndo ops as well so that it hopefully points to the fact that these are
primarily used in the udp_tunnel functions.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:29 -07:00
Alexander Duyck e7b3db5e60 net: Combine GENEVE and VXLAN port notifiers into single functions
This patch merges the GENEVE and VXLAN code so that both functions pass
through a shared code path.  This way we can start the effort of using a
single function on the network device drivers to handle both of these
tunnel types.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:29 -07:00
Alexander Duyck 86a9805725 vxlan/geneve: Include udp_tunnel.h in vxlan/geneve.h and fixup includes
This patch makes it so that we add udp_tunnel.h to vxlan.h and geneve.h
header files.  This is useful as I plan to move the generic handlers for
the port offloads into the udp_tunnel header file and leave the vxlan and
geneve headers to be a bit more protocol specific.

I also went through and cleaned out a number of redundant includes that
where in the .h and .c files for these drivers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:29 -07:00
Nicolas Dichtel 41009481b6 ovs/geneve: fix rtnl notifications on iface deletion
The function geneve_dev_create_fb() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: e305ac6cf5 ("geneve: Add support to collect tunnel metadata.")
CC: Pravin B Shelar <pshelar@nicira.com>
CC: Jesse Gross <jesse@nicira.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 22:21:45 -07:00
Nicolas Dichtel 106da663ff ovs/gre,geneve: fix error path when creating an iface
After ipgre_newlink()/geneve_configure() call, the netdev is registered.

Fixes: 7e059158d5 ("vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices")
CC: David Wragg <david@weave.works>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 22:21:44 -07:00
Hannes Frederic Sowa e5aed006be udp: prevent skbs lingering in tunnel socket queues
In case we find a socket with encapsulation enabled we should call
the encap_recv function even if just a udp header without payload is
available. The callbacks are responsible for correctly verifying and
dropping the packets.

Also, in case the header validation fails for geneve and vxlan we
shouldn't put the skb back into the socket queue, no one will pick
them up there.  Instead we can simply discard them in the respective
encap_recv functions.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 19:56:02 -04:00
David S. Miller e800072c18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
In netdevice.h we removed the structure in net-next that is being
changes in 'net'.  In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.

The mlx5 conflicts have to do with vxlan support dependencies.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09 15:59:24 -04:00
Jarno Rajahalme 229740c631 udp_offload: Set encapsulation before inner completes.
UDP tunnel segmentation code relies on the inner offsets being set for
an UDP tunnel GSO packet, but the inner *_complete() functions will
set the inner offsets only if 'encapsulation' is set before calling
them.  Currently, udp_gro_complete() sets 'encapsulation' only after
the inner *_complete() functions are done.  This causes the inner
offsets having invalid values after udp_gro_complete() returns, which
in turn will make it impossible to properly segment the packet in case
it needs to be forwarded, which would be visible to the user either as
invalid packets being sent or as packet loss.

This patch fixes this by setting skb's 'encapsulation' in
udp_gro_complete() before calling into the inner complete functions,
and by making each possible UDP tunnel gro_complete() callback set the
inner_mac_header to the beginning of the tunnel payload.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Reviewed-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Jarno Rajahalme 43b8448cd7 udp_tunnel: Remove redundant udp_tunnel_gro_complete().
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Hannes Frederic Sowa 681e683ff3 geneve: break dependency with netdev drivers
Equivalent to "vxlan: break dependency with netdev drivers", don't
autoload geneve module in case the driver is loaded. Instead make the
coupling weaker by using netdevice notifiers as proxy.

Cc: Jesse Gross <jesse@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:35:44 -04:00
Dan Carpenter 1ba64facae geneve: testing the wrong variable in geneve6_build_skb()
We intended to test "err" and not "skb".

Fixes: aed069df09 ('ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:08:00 -04:00
Alexander Duyck aed069df09 ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb
This patch updates the IP tunnel core function iptunnel_handle_offloads so
that we return an int and do not free the skb inside the function.  This
actually allows us to clean up several paths in several tunnels so that we
can free the skb at one point in the path without having to have a
secondary path if we are supporting tunnel offloads.

In addition it should resolve some double-free issues I have found in the
tunnels paths as I believe it is possible for us to end up triggering such
an event in the case of fou or gue.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16 19:09:13 -04:00
Tom Herbert 4a0090a98e geneve: change to use UDP socket GRO
Adapt geneve_gro_receive, geneve_gro_complete to take a socket argument.
Set these functions in tunnel_config.  Don't set udp_offloads any more.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 16:53:29 -04:00
Daniel Borkmann 95caf6f71a geneve: fix populating tclass in geneve_get_v6_dst
The struct flowi6's flowi6_tos is not used in IPv6 route lookup, the
traffic class information is handled in the flowi6's flowlabel member
instead. For example, for policy routing, fib6_rule_match() uses
ip6_tclass() that is applied on the flowlabel for matching on tclass,
which would currently not work as expected.

Fixes: 3a56f86f1b ("geneve: handle ipv6 priority like ipv4 tos")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20 13:44:34 -04:00
Alexander Duyck c194cf93c1 gro: Defer clearing of flush bit in tunnel paths
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler.  Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE.  By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 15:01:00 -04:00
Daniel Borkmann 8eb3b99554 geneve: support setting IPv6 flow label
This work adds support for setting the IPv6 flow label for geneve per
device and through collect metadata (ip_tunnel_key) frontends. Also here,
the geneve dst cache does not need any special considerations, for the
cases where caches can be used, the label is static per cache.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 15:14:27 -05:00
Daniel Borkmann 134611446d ip_tunnel: add support for setting flow label via collect metadata
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 15:14:26 -05:00
Daniel Borkmann db3c6139e6 bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit
The assumptions from commit 0c1d70af92 ("net: use dst_cache for vxlan
device"), 468dfffcd7 ("geneve: add dst caching support") and 3c1cb4d260
("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
when ip_tunnel_info is used is unfortunately not always valid as assumed.

While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
with different remote dsts, tos, etc, therefore they cannot be assumed as
packet independent.

Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
would reuse the same entry that was first created on the initial route
lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
a different one.

Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
in vxlan needs to be handeled differently in this context as it is currently
inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
helper is added for the three tunnel cases, which checks if we can use dst
cache.

Fixes: 0c1d70af92 ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd7 ("geneve: add dst caching support")
Fixes: 3c1cb4d260 ("net/ipv4: add dst cache support for gre lwtunnels")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 13:58:47 -05:00
Daniel Borkmann 14ca0751c9 bpf: support for access to tunnel options
After eBPF being able to programmatically access/manage tunnel key meta
data via commit d3aa45ce6b ("bpf: add helpers to access tunnel metadata")
and more recently also for IPv6 through c6c3345407 ("bpf: support ipv6
for bpf_skb_{set,get}_tunnel_key"), this work adds two complementary
helpers to generically access their auxiliary tunnel options.

Geneve and vxlan support this facility. For geneve, TLVs can be pushed,
and for the vxlan case its GBP extension. I.e. setting tunnel key for geneve
case only makes sense, if we can also read/write TLVs into it. In the GBP
case, it provides the flexibility to easily map the group policy ID in
combination with other helpers or maps.

I chose to model this as two separate helpers, bpf_skb_{set,get}_tunnel_opt(),
for a couple of reasons. bpf_skb_{set,get}_tunnel_key() is already rather
complex by itself, and there may be cases for tunnel key backends where
tunnel options are not always needed. If we would have integrated this
into bpf_skb_{set,get}_tunnel_key() nevertheless, we are very limited with
remaining helper arguments, so keeping compatibility on structs in case of
passing in a flat buffer gets more cumbersome. Separating both also allows
for more flexibility and future extensibility, f.e. options could be fed
directly from a map, etc.

Moreover, change geneve's xmit path to test only for info->options_len
instead of TUNNEL_GENEVE_OPT flag. This makes it more consistent with vxlan's
xmit path and allows for avoiding to specify a protocol flag in the API on
xmit, so it can be protocol agnostic. Having info->options_len is enough
information that is needed. Tested with vxlan and geneve.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 13:58:46 -05:00
David S. Miller b633353115 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/bcm7xxx.c
	drivers/net/phy/marvell.c
	drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23 00:09:14 -05:00
Alexander Duyck 14f1f72435 GENEVE: Support outer IPv4 Tx checksums by default
This change makes it so that if UDP CSUM is not specified we will default
to enabling it.  The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for GENEVE
tunnels on hardware that doesn't know how to parse them.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:05:49 -05:00
Paolo Abeni c868ee7063 lwt: fix rx checksum setting for lwt devices tunneling over ipv6
the commit 35e2d1152b ("tunnels: Allow IPv6 UDP checksums to be
correctly controlled.") changed the default xmit checksum setting
for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
set into external UDP header.
This commit changes the rx checksum setting for both lwt vxlan/geneve
devices created by openvswitch accordingly, so that lwt over ipv6
tunnel pairs are again able to communicate with default values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 15:39:30 -05:00
Jiri Benc fc41cdb322 geneve: clear IFF_TX_SKB_SHARING
ether_setup sets IFF_TX_SKB_SHARING but this is not supported by
geneve as it modifies the skb on xmit.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:43:47 -05:00
Jiri Benc 7f290c9435 iptunnel: scrub packet in iptunnel_pull_header
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:54 -05:00
Jiri Benc 9fc4754582 geneve: move geneve device lookup before iptunnel_pull_header
This is in preparation for iptunnel_pull_header calling skb_scrub_packet.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:31 -05:00
Jiri Benc 1e9f12ec92 geneve: implement geneve_get_sk_family helper
Similarly to the existing vxlan_get_sk_family.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:31 -05:00
David Wragg aeee0e66c6 geneve: Refine MTU limit
Calculate the maximum MTU taking into account the size of headers
involved in GENEVE encapsulation, as for other tunnel types.

Changes in v3:
- Correct comment style
Changes in v2:
- Conform more closely to ip_tunnel_change_mtu
- Exclude GENEVE options from max MTU calculation

Signed-off-by: David Wragg <david@weave.works>
Acked-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 13:57:15 -05:00
Paolo Abeni 468dfffcd7 geneve: add dst caching support
use generic dst implementation for both plain geneve devices and
lwtunnels.

In case of UDP traffic with datagram length below MTU this give
about 2% performance increase for plain geneve tunnel over ipv4,
about 65% performance increase for ipv6 tunnel.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:49 -05:00
David Wragg 7e059158d5 vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets.  4.3 introduced netdevs corresponding
to tunnel vports.  These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated.  The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.

Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.

Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10 05:50:03 -05:00
David Wragg 55e5bfb53c geneve: Relax MTU constraints
Allow the MTU of geneve devices to be set to large values, in order to
exploit underlying networks with larger frame sizes.

GENEVE does not have a fixed encapsulation overhead (an openvswitch
rule can add variable length options), so there is no relevant maximum
MTU to enforce.  A maximum of IP_MAX_MTU is used instead.
Encapsulated packets that are too big for the underlying network will
get dropped on the floor.

Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10 05:50:03 -05:00
Jesse Gross 35e2d1152b tunnels: Allow IPv6 UDP checksums to be correctly controlled.
When configuring checksums on UDP tunnels, the flags are different
for IPv4 vs. IPv6 (and reversed). However, when lightweight tunnels
are enabled the flags used are always the IPv4 versions, which are
ignored in the IPv6 code paths. This uses the correct IPv6 flags, so
checksums can be controlled appropriately.

Fixes: a725e514 ("vxlan: metadata based tunneling for IPv6")
Fixes: abe492b4 ("geneve: UDP checksum configuration via netlink")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-21 11:10:40 -08:00
David S. Miller 9d367eddf3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/ethernet/mellanox/mlxsw/spectrum.h
	drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

The bond_main.c and mellanox switch conflicts were cases of
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11 23:55:43 -05:00
Hannes Frederic Sowa 787d7ac308 udp: restrict offloads to one namespace
udp tunnel offloads tend to aggregate datagrams based on inner
headers. gro engine gets notified by tunnel implementations about
possible offloads. The match is solely based on the port number.

Imagine a tunnel bound to port 53, the offloading will look into all
DNS packets and tries to aggregate them based on the inner data found
within. This could lead to data corruption and malformed DNS packets.

While this patch minimizes the problem and helps an administrator to find
the issue by querying ip tunnel/fou, a better way would be to match on
the specific destination ip address so if a user space socket is bound
to the same address it will conflict.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:28:24 -05:00
David S. Miller c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Pravin B Shelar 039f50629b ip_tunnel: Move stats update to iptunnel_xmit()
By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-25 23:32:23 -05:00
Paolo Abeni 184fc8b5ee geneve: initialize needed_headroom
Currently the needed_headroom field for the geneve device is left
to the default value.

This patch set it to space required for basic geneve encapsulation,
so that we can avoid the skb head re-allocation on xmit.

This give a 6% speedup for unsegment traffic on geneve tunnel.

v1 -> v2:
  - add ETH_HLEN for the lower device to the needed headroom

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-23 22:28:29 -05:00
David S. Miller b3e0d3d7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-17 22:08:28 -05:00
Singhai, Anjali 05ca4029b2 geneve: Add geneve_get_rx_port support
This patch adds an op that the drivers can call into to get existing
geneve ports.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-16 10:58:56 -05:00
Singhai, Anjali a8170d2b9e geneve: Add geneve udp port offload for ethernet devices
Add ndo_ops to add/del UDP ports to a device that supports geneve
offload.

v2: Comment fix.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-16 10:58:46 -05:00
Tom Herbert abe492b4f5 geneve: UDP checksum configuration via netlink
Add support to enable and disable UDP checksums via netlink. This is
similar to how VXLAN and GUE allow this. This includes support for
enabling the UDP zero checksum (for both TX and RX).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-13 23:58:03 -05:00
Pravin B Shelar a322a1bcf3 geneve: Fix IPv6 xmit stats update.
Call to iptunnel_xmit_stats() is not required after udp-tunnel6-xmit.
By calling iptunnel_xmit_stats() results in incorrect device stats.
Following patch drops this call.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08 22:39:03 -05:00
John W. Linville b8812fa883 geneve: add IPv6 bits to geneve_fill_metadata_dst
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:10:54 +09:00
John W. Linville 3a56f86f1b geneve: handle ipv6 priority like ipv4 tos
Other callers of udp_tunnel6_xmit_skb just pass 0 for the prio
argument.  Jesse Gross <jesse@nicira.com> suggested that prio is really
the same as IPv4's tos and should be handled the same, so this is my
interpretation of that suggestion.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reported-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:10:51 +09:00
John W. Linville 8ed66f0e82 geneve: implement support for IPv6-based tunnels
NOTE: Link-local IPv6 addresses for remote endpoints are not supported,
since the driver currently has no capacity for binding a geneve
interface to a specific link.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:10:51 +09:00
Pravin B Shelar fc4099f172 openvswitch: Fix egress tunnel info.
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices.  Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 19:39:25 -07:00
Jesse Gross e277de5f3f tunnels: Don't require remote endpoint or ID during creation.
Before lightweight tunnels existed, it really didn't make sense to
create a tunnel that was not fully specified, such as without a
destination IP address - the resulting packets would go nowhere.
However, with lightweight tunnels, the opposite is true - it doesn't
make sense to require this information when it will be provided later
on by the route. This loosens the requirements for this information.

An alternative would be to allow the relaxed version only when
COLLECT_METADATA is enabled. However, since there are several
variations on this theme (such as NBMA tunnels in GRE), just dropping
the restrictions seems the most consistent across tunnels and with
the existing configuration.

CC: John Linville <linville@tuxdriver.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:44:10 -07:00
John W. Linville 7bbe33ff18 geneve: use network byte order for destination port config parameter
This is primarily for consistancy with vxlan and other tunnels which
use network byte order for similar parameters.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23 15:41:04 -07:00
John W. Linville 08399efc63 geneve: ensure ECN info is handled properly in all tx/rx paths
Partially due to a pre-exising "thinko", the new metadata-based tx/rx
paths were handling ECN propagation differently than the traditional
tx/rx paths.  This patch removes the "thinko" (involving multiple
ip_hdr assignments) on the rx path and corrects the ECN handling on
both the rx and tx paths.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-22 16:49:56 -07:00