Commit graph

438 commits

Author SHA1 Message Date
David Wragg
7e059158d5 vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets.  4.3 introduced netdevs corresponding
to tunnel vports.  These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated.  The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.

Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.

Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10 05:50:03 -05:00
Hannes Frederic Sowa
b064d0d88a ovs: limit ovs recursions in ovs_execute_actions to not corrupt stack
It was seen that defective configurations of openvswitch could overwrite
the STACK_END_MAGIC and cause a hard crash of the kernel because of too
many recursions within ovs.

This problem arises due to the high stack usage of openvswitch. The rest
of the kernel is fine with the current limit of 10 (RECURSION_LIMIT).

We use the already existing recursion counter in ovs_execute_actions to
implement an upper bound of 5 recursions.

Cc: Pravin Shelar <pshelar@ovn.org>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-18 12:09:45 -05:00
Konstantin Khlebnikov
9207f9d45b net: preserve IP control block during GSO segmentation
Skb_gso_segment() uses skb control block during segmentation.
This patch adds 32-bytes room for previous control block which
will be copied into all resulting segments.

This patch fixes kernel crash during fragmenting forwarded packets.
Fragmentation requires valid IP CB in skb for clearing ip options.
Also patch removes custom save/restore in ovs code, now it's redundant.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-15 14:35:24 -05:00
Jean Sacren
c5420eb12f openvswitch: update kernel doc for struct vport
commit be4ace6e6b ("openvswitch: Move dev pointer into vport itself")

The commit above added @dev and moved @rcu to the bottom of struct
vport, but the change was not reflected in the kernel doc. So let's
update the kernel doc as well.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 23:49:21 -05:00
Jean Sacren
2f7066ada1 openvswitch: fix struct geneve_port member name
commit 6b001e682e ("openvswitch: Use Geneve device.")

The commit above introduced 'port_no' as the name for the member of
struct geneve_port. The correct name should be 'dst_port' as described
in the kernel doc. Let's fix that member name and all the pertinent
instances so that both doc and code would be consistent.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 23:49:21 -05:00
Jean Sacren
5ea030429f openvswitch: clean up unused function
commit 6b001e682e ("openvswitch: Use Geneve device.")

The commit above deleted the only call site of ovs_tunnel_route_lookup()
and now that function is not used any more. So let's delete the function
definition as well.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 23:49:21 -05:00
David S. Miller
c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Joe Stringer
90c7afc96c openvswitch: Fix template leak in error cases.
Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
reference leak on helper objects, but inadvertently introduced a leak on
the ct template.

Previously, ct_info.ct->general.use was initialized to 0 by
nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
returned successful. If an error occurred while adding the helper or
adding the action to the actions buffer, the __ovs_ct_free_action()
cleanup would use nf_ct_put() to free the entry; However, this relies on
atomic_dec_and_test(ct_info.ct->general.use). This reference must be
incremented first, or nf_ct_put() will never free it.

Fix the issue by acquiring a reference to the template immediately after
allocation.

Fixes: cae3a26275 ("openvswitch: Allow attaching helpers to ct action")
Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-29 15:27:52 -05:00
Simon Horman
e905eabc90 openvswitch: correct encoding of set tunnel action attributes
In a set action tunnel attributes should be encoded in a
nested action.

I noticed this because ovs-dpctl was reporting an error
when dumping flows due to the incorrect encoding of tunnel attributes
in a set action.

Fixes: fc4099f172 ("openvswitch: Fix egress tunnel info.")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 16:35:35 -05:00
David S. Miller
59ce9670ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:

1) Two patches to include dependencies in our netfilter userspace
   headers to resolve compilation problems, from Mikko Rapeli.

2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

3) Remove duplicate include in the netfilter reject infrastructure,
   from Stephen Hemminger.

4) Two patches to simplify the netfilter defragmentation code for IPv6,
   patch from Florian Westphal.

5) Fix root ownership of /proc/net netfilter for unpriviledged net
   namespaces, from Philip Whineray.

6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

7) Add mangling support to our nf_tables payload expression, from
   Patrick McHardy.

8) Introduce a new netlink-based tracing infrastructure for nf_tables,
   from Florian Westphal.

9) Change setter functions in nfnetlink_log to be void, from
    Rami Rosen.

10) Add netns support to the cttimeout infrastructure.

11) Add cgroup2 support to iptables, from Tejun Heo.

12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

13) Add support for mangling pkttype in the nf_tables meta expression,
    also from Florian.

BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 15:37:42 -05:00
David S. Miller
b3e0d3d7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-17 22:08:28 -05:00
Pablo Neira Ayuso
a4ec80082c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Resolve conflict between commit 264640fc2c ("ipv6: distinguish frag
queues by device for multicast and link-local packets") from the net
tree and commit 029f7f3b87 ("netfilter: ipv6: nf_defrag: avoid/free
clone operations") from the nf-next tree.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Conflicts:
	net/ipv6/netfilter/nf_conntrack_reasm.c
2015-12-14 20:31:16 +01:00
Joe Stringer
d110986c5d openvswitch: Respect conntrack zone even if invalid
If userspace executes ct(zone=1), and the connection tracker determines
that the packet is invalid, then the ct_zone flow key field is populated
with the default zone rather than the zone that was specified. Even
though connection tracking failed, this field should be updated with the
value that the action specified. Fix the issue.

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11 23:31:31 -05:00
Joe Stringer
2f3ab9f9fc openvswitch: Fix helper reference leak
If the actions (re)allocation fails, or the actions list is larger than the
maximum size, and the conntrack action is the last action when these
problems are hit, then references to helper modules may be leaked. Fix
the issue.

Fixes: cae3a26275 ("openvswitch: Allow attaching helpers to ct action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-11 23:31:31 -05:00
David S. Miller
f188b951f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/renesas/ravb_main.c
	kernel/bpf/syscall.c
	net/ipv4/ipmr.c

All three conflicts were cases of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 21:09:12 -05:00
Paolo Abeni
1317530302 openvswitch: fix hangup on vxlan/gre/geneve device deletion
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc39 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 14:29:25 -05:00
Jiri Pirko
29bf24afb2 net: add possibility to pass information about upper device via notifier
Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:49:25 -05:00
Jiri Pirko
6dffb0447c net: propagate upper priv via netdev_master_upper_dev_link
Eliminate netdev_master_upper_dev_link_private and pass priv directly as
a parameter of netdev_master_upper_dev_link.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:49:25 -05:00
Paolo Abeni
83e4bf7a74 openvswitch: properly refcount vport-vxlan module
After 614732eaa1, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.

v1 -> v2:
 - move vport 'owner' initialization in ovs_vport_ops_register()
   and make such function a macro

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02 11:50:59 -05:00
Aaron Conole
20f795666d net: openvswitch: Remove invalid comment
During pre-upstream development, the openvswitch datapath used a custom
hashtable to store vports that could fail on delete due to lack of
memory. However, prior to upstream submission, this code was reworked to
use an hlist based hastable with flexible-array based buckets. As such
the failure condition was eliminated from the vport_del path, rendering
this comment invalid.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 17:18:00 -05:00
Florian Westphal
daaa7d647f netfilter: ipv6: avoid nf_iterate recursion
The previous patch changed nf_ct_frag6_gather() to morph reassembled skb
with the previous one.

This means that the return value is always NULL or the skb argument.
So change it to an err value.

Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks
we can now just return NF_ACCEPT to move on to the next hook except for
-EINPROGRESS (which means skb has been queued for reassembly), in which case we
return NF_STOLEN.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:45 +01:00
Florian Westphal
029f7f3b87 netfilter: ipv6: nf_defrag: avoid/free clone operations
commit 6aafeef03b
("netfilter: push reasm skb through instead of original frag skbs")
changed ipv6 defrag to not use the original skbs anymore.

So rather than keeping the original skbs around just to discard them
afterwards just use the original skbs directly for the fraglist of
the newly assembled skb and remove the extra clone/free operations.

The skb that completes the fragment queue is morphed into a the
reassembled one instead, just like ipv4 defrag.

openvswitch doesn't need any additional skb_morph magic anymore to deal
with this situation so just remove that.

A followup patch can then also remove the NF_HOOK (re)invocation in
the ipv6 netfilter defrag hook.

Cc: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:44 +01:00
Linus Torvalds
75021d2859 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "Trivial stuff from trivial tree that can be trivially summed up as:

   - treewide drop of spurious unlikely() before IS_ERR() from Viresh
     Kumar

   - cosmetic fixes (that don't really affect basic functionality of the
     driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

   - various comment / printk fixes and updates all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  bcache: Really show state of work pending bit
  hwmon: applesmc: fix comment typos
  Kconfig: remove comment about scsi_wait_scan module
  class_find_device: fix reference to argument "match"
  debugfs: document that debugfs_remove*() accepts NULL and error values
  net: Drop unlikely before IS_ERR(_OR_NULL)
  mm: Drop unlikely before IS_ERR(_OR_NULL)
  fs: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
  UBI: Update comments to reflect UBI_METAONLY flag
  pktcdvd: drop null test before destroy functions
2015-11-07 13:05:44 -08:00
David S. Miller
b75ec3af27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-01 00:15:30 -04:00
Joe Stringer
6f5cadee44 openvswitch: Fix skb leak using IPv6 defrag
nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the
reassembly is successful, expects the caller to free all of the original
skbs using nf_ct_frag6_consume_orig(). This call was previously missing,
meaning that the original fragments were never freed (with the exception
of the last fragment to arrive).

Fix this by ensuring that all original fragments except for the last
fragment are freed via nf_ct_frag6_consume_orig(). The last fragment
will be morphed into the head, so it must not be freed yet. Furthermore,
retain the ->next pointer for the head after skb_morph().

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:32:18 -07:00
Joe Stringer
74c1661813 openvswitch: Fix double-free on ip_defrag() errors
If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
freed. When handle_fragments() passes this back up to
do_execute_actions(), it will be freed again. Prevent this double free
by never freeing the skb in do_execute_actions() for errors returned by
ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:32:14 -07:00
David S. Miller
ba3e2084f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/xfrm6_output.c
	net/openvswitch/flow_netlink.c
	net/openvswitch/vport-gre.c
	net/openvswitch/vport-vxlan.c
	net/openvswitch/vport.c
	net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes.  One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:54:12 -07:00
Pravin B Shelar
fc4099f172 openvswitch: Fix egress tunnel info.
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices.  Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 19:39:25 -07:00
Pravin B Shelar
aec1592474 openvswitch: Use dev_queue_xmit for vport send.
With use of lwtunnel, we can directly call dev_queue_xmit()
rather than calling netdev vport send operation.
Following change make tunnel vport code bit cleaner.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:46:16 -07:00
Pravin B Shelar
99e28f18e3 openvswitch: Fix incorrect type use.
Patch fixes following sparse warning.
net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types)
net/openvswitch/flow_netlink.c:583:30:    expected restricted __be16 [usertype] ipv4
net/openvswitch/flow_netlink.c:583:30:    got int

Fixes: 6b26ba3a7d ("openvswitch: netlink attributes for IPv6 tunneling")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:46:13 -07:00
Joe Stringer
e754ec69ab openvswitch: Serialize nested ct actions if provided
If userspace provides a ct action with no nested mark or label, then the
storage for these fields is zeroed. Later when actions are requested,
such zeroed fields are serialized even though userspace didn't
originally specify them. Fix the behaviour by ensuring that no action is
serialized in this case, and reject actions where userspace attempts to
set these fields with mask=0. This should make netlink marshalling
consistent across deserialization/reserialization.

Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:43 -07:00
Joe Stringer
4f0909ee3d openvswitch: Mark connections new when not confirmed.
New, related connections are marked as such as part of ovs_ct_lookup(),
but they are not marked as "new" if the commit flag is used. Make this
consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct).

Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:40 -07:00
Joe Stringer
9e384715e9 openvswitch: Reject ct_state masks for unknown bits
Currently, 0-bits are generated in ct_state where the bit position is
undefined, and matches are accepted on these bit-positions. If userspace
requests to match the 0-value for this bit then it may expect only a
subset of traffic to match this value, whereas currently all packets
will have this bit set to 0. Fix this by rejecting such masks.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:36 -07:00
James Morse
1241365f1a openvswitch: Allocate memory for ovs internal device stats.
"openvswitch: Remove vport stats" removed the per-vport statistics, in
order to use the netdev's statistics fields.
"openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats
to user-space, by using the provided netdev_ops to collate them - but ovs
internal devices still use an unallocated dev->tstats field to count
packets, which are no longer exported by this api.

Allocate the dev->tstats field for ovs internal devices, and wire up
ndo_get_stats64 with the original implementation of
ovs_vport_get_stats().

On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs,
unmasking a full-on panic on arm64:

=============%<==============
[<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch]
[<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch]
[<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch]
[<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch]
[<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch]
[<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310
[<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4
[<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc
[<ffffffc0005e8a94>] genl_rcv+0x38/0x50
[<ffffffc0005e76c0>] netlink_unicast+0x164/0x210
[<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368
[<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c
[SNIP]
Kernel panic - not syncing: Fatal exception in interrupt
=============%<==============

Fixes: 8c876639c9 ("openvswitch: Remove vport stats.")
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:06:36 -07:00
David S. Miller
26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
Joe Stringer
740dbc2891 openvswitch: Scrub skb between namespaces
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.

This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:24:50 -07:00
Eric W. Biederman
b72775977c ipv6: Pass struct net into nf_ct_frag6_gather
The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack.  In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.

The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:17 -07:00
Eric W. Biederman
19bcf9f203 ipv4: Pass struct net into ip_defrag and ip_check_defrag
The function ip_defrag is called on both the input and the output
paths of the networking stack.  In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.

So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:16 -07:00
Joe Stringer
ab38a7b5a4 openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT
Previously, the CT_ATTR_FLAGS attribute, when nested under the
OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the
semantics of the ct action. It's more extensible to just represent each
flag as a nested attribute, and this requires no additional error
checking to reject flags that aren't currently supported.

Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:03:06 -07:00
Joe Stringer
fbccce5965 openvswitch: Extend ct_state match field to 32 bits
The ct_state field was initially added as an 8-bit field, however six of
the bits are already being used and use cases are already starting to
appear that may push the limits of this field. This patch extends the
field to 32 bits while retaining the internal representation of 8 bits.
This should cover forward compatibility of the ABI for the foreseeable
future.

This patch also reorders the OVS_CS_F_* bits to be sequential.

Suggested-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:03:06 -07:00
Joe Stringer
6f22595246 openvswitch: Reject ct_state unsupported bits
Previously, if userspace specified ct_state bits in the flow key which
are currently undefined (and therefore unsupported), then they would be
ignored. This could cause unexpected behaviour in future if userspace is
extended to support additional bits but attempts to communicate with the
current version of the kernel. This patch rectifies the situation by
rejecting such ct_state bits.

Fixes: 7f8a436eaa "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:03:05 -07:00
Joe Stringer
ec0d043d05 openvswitch: Ensure flow is valid before executing ct
The ct action uses parts of the flow key, so we need to ensure that it
is valid before executing that action.

Fixes: 7f8a436eaa "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:03:05 -07:00
Joe Stringer
b8f2257069 openvswitch: Fix skb leak in ovs_fragment()
If ovs_fragment() was unable to fragment the skb due to an L2 header
that exceeds the supported length, skbs would be leaked. Fix the bug.

Fixes: 7f8a436eaa "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 05:03:03 -07:00
Jiri Benc
6b26ba3a7d openvswitch: netlink attributes for IPv6 tunneling
Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support
for tunnels.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:18:00 -07:00
Jiri Benc
00a93babd0 openvswitch: add tunnel protocol to sw_flow_key
Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:17:59 -07:00
Pravin B Shelar
83ffe99f52 openvswitch: Fix ovs_vport_get_stats()
Not every device has dev->tstats set. So when OVS tries to calculate
vport stats it causes kernel panic. Following patch fixes it by
using standard API to get net-device stats.

---8<---
Unable to handle kernel paging request at virtual address 766b4008
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in: vport_vxlan vxlan ip6_udp_tunnel udp_tunnel tun bridge stp llc openvswitch ipv6
CPU: 7 PID: 1108 Comm: ovs-vswitchd Not tainted 4.3.0-rc3+ #82
PC is at ovs_vport_get_stats+0x150/0x1f8 [openvswitch]
<snip>
Call trace:
 [<ffffffbffc0859f8>] ovs_vport_get_stats+0x150/0x1f8 [openvswitch]
 [<ffffffbffc07cdb0>] ovs_vport_cmd_fill_info+0x140/0x1e0 [openvswitch]
 [<ffffffbffc07cf0c>] ovs_vport_cmd_dump+0xbc/0x138 [openvswitch]
 [<ffffffc00045a5ac>] netlink_dump+0xb8/0x258
 [<ffffffc00045ace0>] __netlink_dump_start+0x120/0x178
 [<ffffffc00045dd9c>] genl_family_rcv_msg+0x2d4/0x308
 [<ffffffc00045de58>] genl_rcv_msg+0x88/0xc4
 [<ffffffc00045cf24>] netlink_rcv_skb+0xd4/0x100
 [<ffffffc00045dab0>] genl_rcv+0x30/0x48
 [<ffffffc00045c830>] netlink_unicast+0x154/0x200
 [<ffffffc00045cc9c>] netlink_sendmsg+0x308/0x364
 [<ffffffc00041e10c>] sock_sendmsg+0x14/0x2c
 [<ffffffc000420d58>] SyS_sendto+0xbc/0xf0
Code: aa1603e1 f94037a4 aa1303e2 aa1703e0 (f9400465)

Reported-by: Tomasz Sawicki <tomasz.sawicki@objectiveintegration.uk>
Fixes: 8c876639c9 ("openvswitch: Remove vport stats.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 07:06:35 -07:00
Konstantin Khlebnikov
598c12d0ba ovs: do not allocate memory from offline numa node
When openvswitch tries allocate memory from offline numa node 0:
stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
[ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
This patch disables numa affinity in this case.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:42:03 -07:00
Joe Stringer
33db4125ec openvswitch: Rename LABEL->LABELS
Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
for these to be consistent with conntrack.

Fixes: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05 06:34:28 -07:00
Eric W. Biederman
184e16d79d openvswitch: Remove ovs_vport_output_sk
This was a compatibility function needed while the ipv4 and ipv6
fragmentation code was being modified to pass a struct net through
them.  Now that is complete this function has no more users so remove
it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-09-30 01:45:05 -05:00
Eric W. Biederman
7d8c6e3915 ipv6: Pass struct net through ip6_fragment
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-09-30 01:45:03 -05:00