Commit graph

879 commits

Author SHA1 Message Date
Mianhan Liu
a1554c0026 include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h
nr_free_buffer_pages could be exposed through mm.h instead of swap.h.
The advantage of this change is that it can reduce the obsolete
includes.  For example, net/ipv4/tcp.c wouldn't need swap.h any more
since it has already included mm.h.  Similarly, after checking all the
other files, it comes that tcp.c, udp.c meter.c ,...  follow the same
rule, so these files can have swap.h removed too.

Moreover, after preprocessing all the files that use
nr_free_buffer_pages, it turns out that those files have already
included mm.h.Thus, we can move nr_free_buffer_pages from swap.h to mm.h
safely.  This change will not affect the compilation of other files.

Link: https://lkml.kernel.org/r/20210912133640.1624-1-liumh1@shanghaitech.edu.cn
Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Cc: Jakub Kicinski <kuba@kernel.org>
CC: Ulf Hansson <ulf.hansson@linaro.org>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Simon Horman <horms@verge.net.au>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00
Jakub Kicinski
f444fea789 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/ptp/Kconfig:
  55c8fca1da ("ptp_pch: Restore dependency on PCI")
  e5f3155267 ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 18:09:18 -07:00
kaixi.fan
01634047bf ovs: clear skb->tstamp in forwarding path
fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
doesn't clear skb->tstamp. We encountered a problem with linux
version 5.4.56 and ovs version 2.14.1, and packets failed to
dequeue from qdisc when fq qdisc was attached to ovs port.

Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com>
Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 11:31:13 +01:00
Jakub Kicinski
f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f2580 ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda65 ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557d ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a376 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Randy Dunlap
d6e712aa7e net: openvswitch: fix kernel-doc warnings in flow.c
Repair kernel-doc notation in a few places to make it conform to
the expected format.

Fixes the following kernel-doc warnings:

flow.c:296: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Parse vlan tag from vlan header.
flow.c:296: warning: missing initial short description on line:
 * Parse vlan tag from vlan header.
flow.c:537: warning: No description found for return value of 'key_extract_l3l4'
flow.c:769: warning: No description found for return value of 'key_extract'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: dev@openvswitch.org
Link: https://lore.kernel.org/r/20210808190834.23362-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09 15:37:35 -07:00
Mark Gray
076999e460 openvswitch: fix sparse warning incorrect type
fix incorrect type in argument 1 (different address spaces)

../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces)
../net/openvswitch/datapath.c:169:17:    expected void const *
../net/openvswitch/datapath.c:169:17:    got struct dp_nlsk_pids [noderef] __rcu *upcall_portids

Found at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159

Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 11:48:43 +01:00
Mark Gray
784dcfa56e openvswitch: fix alignment issues
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 11:48:42 +01:00
Mark Gray
b83d23a2a3 openvswitch: Introduce per-cpu upcall dispatch
The Open vSwitch kernel module uses the upcall mechanism to send
packets from kernel space to user space when it misses in the kernel
space flow table. The upcall sends packets via a Netlink socket.
Currently, a Netlink socket is created for every vport. In this way,
there is a 1:1 mapping between a vport and a Netlink socket.
When a packet is received by a vport, if it needs to be sent to
user space, it is sent via the corresponding Netlink socket.

This mechanism, with various iterations of the corresponding user
space code, has seen some limitations and issues:

* On systems with a large number of vports, there is a correspondingly
large number of Netlink sockets which can limit scaling.
(https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
* Packet reordering on upcalls.
(https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
* A thundering herd issue.
(https://bugzilla.redhat.com/show_bug.cgi?id=1834444)

This patch introduces an alternative, feature-negotiated, upcall
mode using a per-cpu dispatch rather than a per-vport dispatch.

In this mode, the Netlink socket to be used for the upcall is
selected based on the CPU of the thread that is executing the upcall.
In this way, it resolves the issues above as:

a) The number of Netlink sockets scales with the number of CPUs
rather than the number of vports.
b) Ordering per-flow is maintained as packets are distributed to
CPUs based on mechanisms such as RSS and flows are distributed
to a single user space thread.
c) Packets from a flow can only wake up one user space thread.

The corresponding user space code can be found at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html

Bugzilla: https://bugzilla.redhat.com/1844576
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16 11:06:33 -07:00
Baowen Zheng
b18114476a openvswitch: Optimize operation for key comparison
In the current implement when comparing two flow keys, we will return
result after comparing the whole key from start to end.

In our optimization, we will return result in the first none-zero
comparison, then we will improve the flow table looking up efficiency.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01 11:13:10 -07:00
Aaron Conole
c4ab7b56be openvswitch: add trace points
This makes openvswitch module use the event tracing framework
to log the upcall interface and action execution pipeline.  When
using openvswitch as the packet forwarding engine, some types of
debugging are made possible simply by using the ovs-vswitchd's
ofproto/trace command.  However, such a command has some
limitations:

  1. When trying to trace packets that go through the CT action,
     the state of the packet can't be determined, and probably
     would be potentially wrong.

  2. Deducing problem packets can sometimes be difficult as well
     even if many of the flows are known

  3. It's possible to use the openvswitch module even without
     the ovs-vswitchd (although, not common use).

Introduce the event tracing points here to make it possible for
working through these problems in kernel space.  The style is
copied from the mac80211 driver-trace / trace code for
consistency - this creates some checkpatch splats, but the
official 'guide' for adding tracepoints, as well as the existing
examples all add the same splats so it seems acceptable.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:47:32 -07:00
Jakub Kicinski
5ada57a9a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 09:55:10 -07:00
Tao Liu
e4df1b0c24 openvswitch: meter: fix race when getting now_ms.
We have observed meters working unexpected if traffic is 3+Gbit/s
with multiple connections.

now_ms is not pretected by meter->lock, we may get a negative
long_delta_ms when another cpu updated meter->used, then:
    delta_ms = (u32)long_delta_ms;
which will be a large value.

    band->bucket += delta_ms * band->rate;
then we get a wrong band->bucket.

OpenVswitch userspace datapath has fixed the same issue[1] some
time ago, and we port the implementation to kernel datapath.

[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20191025114436.9746-1-i.maximets@ovn.org/

Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:54:59 -07:00
Yejune Deng
d2792e91de net: openvswitch: Remove unnecessary skb_nfct()
There is no need add 'if (skb_nfct(skb))' assignment, the
nf_conntrack_put() would check it.

Signed-off-by: Yejune Deng <yejunedeng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-10 14:18:19 -07:00
Davide Caratti
7c0ea5930c openvswitch: fix stack OOB read while fragmenting IPv4 packets
running openvswitch on kernels built with KASAN, it's possible to see the
following splat while testing fragmentation of IPv4 packets:

 BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
 Read of size 1 at addr ffff888112fc713c by task handler2/1367

 CPU: 0 PID: 1367 Comm: handler2 Not tainted 5.12.0-rc6+ #418
 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
 Call Trace:
  dump_stack+0x92/0xc1
  print_address_description.constprop.7+0x1a/0x150
  kasan_report.cold.13+0x7f/0x111
  ip_do_fragment+0x1b03/0x1f60
  ovs_fragment+0x5bf/0x840 [openvswitch]
  do_execute_actions+0x1bd5/0x2400 [openvswitch]
  ovs_execute_actions+0xc8/0x3d0 [openvswitch]
  ovs_packet_cmd_execute+0xa39/0x1150 [openvswitch]
  genl_family_rcv_msg_doit.isra.15+0x227/0x2d0
  genl_rcv_msg+0x287/0x490
  netlink_rcv_skb+0x120/0x380
  genl_rcv+0x24/0x40
  netlink_unicast+0x439/0x630
  netlink_sendmsg+0x719/0xbf0
  sock_sendmsg+0xe2/0x110
  ____sys_sendmsg+0x5ba/0x890
  ___sys_sendmsg+0xe9/0x160
  __sys_sendmsg+0xd3/0x170
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f957079db07
 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 eb ec ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 24 ed ff ff 48
 RSP: 002b:00007f956ce35a50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f957079db07
 RDX: 0000000000000000 RSI: 00007f956ce35ae0 RDI: 0000000000000019
 RBP: 00007f956ce35ae0 R08: 0000000000000000 R09: 00007f9558006730
 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
 R13: 00007f956ce37308 R14: 00007f956ce35f80 R15: 00007f956ce35ae0

 The buggy address belongs to the page:
 page:00000000af2a1d93 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112fc7
 flags: 0x17ffffc0000000()
 raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000
 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 addr ffff888112fc713c is located in stack of task handler2/1367 at offset 180 in frame:
  ovs_fragment+0x0/0x840 [openvswitch]

 this frame has 2 objects:
  [32, 144) 'ovs_dst'
  [192, 424) 'ovs_rt'

 Memory state around the buggy address:
  ffff888112fc7000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff888112fc7080: 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00
 >ffff888112fc7100: 00 00 00 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
                                         ^
  ffff888112fc7180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff888112fc7200: 00 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00

for IPv4 packets, ovs_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

  ip_do_fragment()
    ip_skb_dst_mtu()
      ip_dst_mtu_maybe_forward()
        ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in ovs_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: d52e5a7e7c ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt")
Cc: <stable@vger.kernel.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-29 15:31:53 -07:00
David S. Miller
6876a18d33 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-26 12:00:00 -07:00
Ilya Maximets
7d742b509d openvswitch: meter: remove rate from the bucket size calculation
Implementation of meters supposed to be a classic token bucket with 2
typical parameters: rate and burst size.

Burst size in this schema is the maximum number of bytes/packets that
could pass without being rate limited.

Recent changes to userspace datapath made meter implementation to be
in line with the kernel one, and this uncovered several issues.

The main problem is that maximum bucket size for unknown reason
accounts not only burst size, but also the numerical value of rate.
This creates a lot of confusion around behavior of meters.

For example, if rate is configured as 1000 pps and burst size set to 1,
this should mean that meter will tolerate bursts of 1 packet at most,
i.e. not a single packet above the rate should pass the meter.
However, current implementation calculates maximum bucket size as
(rate + burst size), so the effective bucket size will be 1001.  This
means that first 1000 packets will not be rate limited and average
rate might be twice as high as the configured rate.  This also makes
it practically impossible to configure meter that will have burst size
lower than the rate, which might be a desirable configuration if the
rate is high.

Inability to configure low values of a burst size and overall inability
for a user to predict what will be a maximum and average rate from the
configured parameters of a meter without looking at the OVS and kernel
code might be also classified as a security issue, because drop meters
are frequently used as a way of protection from DoS attacks.

This change removes rate from the calculation of a bucket size, making
it in line with the classic token bucket algorithm and essentially
making the rate and burst tolerance being predictable from a users'
perspective.

Same change proposed for the userspace implementation.

Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-23 13:08:47 -07:00
Jakub Kicinski
8859a44ea0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

MAINTAINERS
 - keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
 - simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
 - trivial
include/linux/ethtool.h
 - trivial, fix kdoc while at it
include/linux/skmsg.h
 - move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
 - add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
 - trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 20:48:35 -07:00
Ilya Maximets
4d51419d49 openvswitch: fix send of uninitialized stack memory in ct limit reply
'struct ovs_zone_limit' has more members than initialized in
ovs_ct_limit_get_default_limit().  The rest of the memory is a random
kernel stack content that ends up being sent to userspace.

Fix that by using designated initializer that will clear all
non-specified fields.

Fixes: 11efd5cb04 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 12:54:42 -07:00
Christophe JAILLET
7d42e84eb9 net: openvswitch: Use 'skb_push_rcsum()' instead of hand coding it
'skb_push()'/'skb_postpush_rcsum()' can be replaced by an equivalent
'skb_push_rcsum()' which is less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-04 01:43:02 -07:00
David S. Miller
efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Bhaskar Chowdhury
f44773058c openvswitch: Fix a typo
s/subsytem/subsystem/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-22 12:59:46 -07:00
Flavio Leitner
ebfbc46b35 openvswitch: Warn over-mtu packets only if iface is UP.
It is not unusual to have the bridge port down. Sometimes
it has the old MTU, which is fine since it's not being used.

However, the kernel spams the log with a warning message
when a packet is going to be sent over such port. Fix that
by warning only if the interface is UP.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 16:28:30 -07:00
wenxu
d29334c15d net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.

Fixes: 7baf2429a1 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:22:18 -07:00
Eelco Chaudron
a5317f3b06 net: openvswitch: add log message for error case
As requested by upstream OVS, added some error messages in the
validate_and_copy_dec_ttl function.

Includes a small cleanup, which removes an unnecessary parameter
from the dec_ttl_exception_handler() function.

Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Link: https://lore.kernel.org/r/161054576573.26637.18396634650212670580.stgit@ebuild
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 16:32:14 -08:00
Eelco Chaudron
09d6217254 net: openvswitch: fix TTL decrement exception action execution
Currently, the exception actions are not processed correctly as the wrong
dataset is passed. This change fixes this, including the misleading
comment.

In addition, a check was added to make sure we work on an IPv4 packet,
and not just assume if it's not IPv6 it's IPv4.

This was all tested using OVS with patch,
https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
applied and sending packets with a TTL of 1 (and 0), both with IPv4
and IPv6.

Fixes: 69929d4c49 ("net: openvswitch: fix TTL decrement action netlink message format")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160733569860.3007.12938188180387116741.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:18:25 -08:00
Jakub Kicinski
46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Zheng Yongjun
5e359044c1 net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-08 16:22:54 -08:00
Wang Hai
bb2da7651a openvswitch: fix error return code in validate_and_copy_dec_ttl()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Changing 'return start' to 'return action_start' can fix this bug.

Fixes: 69929d4c49 ("net: openvswitch: fix TTL decrement action netlink message format")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04 15:43:14 -08:00
Jakub Kicinski
55fd59b003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:
	drivers/net/ethernet/ibm/ibmvnic.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 15:44:09 -08:00
Davide Caratti
43c13605ba net: openvswitch: ensure LSE is pullable before reading it
when openvswitch is configured to mangle the LSE, the current value is
read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that
the label is contained in the skb "linear" area.

Found by code inspection.

Fixes: d27cf5c59a ("net: core: add MPLS update core helper and use in OvS")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/aa099f245d93218b84b5c056b67b6058ccf81a66.1606987185.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 11:13:29 -08:00
Jakub Kicinski
5c39f26e67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.

Conflicts:
	drivers/net/can/usb/gs_usb.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 18:25:27 -08:00
Eelco Chaudron
69929d4c49 net: openvswitch: fix TTL decrement action netlink message format
Currently, the openvswitch module is not accepting the correctly formated
netlink message for the TTL decrement action. For both setting and getting
the dec_ttl action, the actions should be nested in the
OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.

When the original patch was sent, it was tested with a private OVS userspace
implementation. This implementation was unfortunately not upstreamed and
reviewed, hence an erroneous version of this patch was sent out.

Leaving the patch as-is would cause problems as the kernel module could
interpret additional attributes as actions and vice-versa, due to the
actions not being encapsulated/nested within the actual attribute, but
being concatinated after it.

Fixes: 744676e777 ("openvswitch: add TTL decrement action")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 11:03:06 -08:00
Numan Siddique
e2ef5203c8 net: openvswitch: Be liberal in tcp conntrack.
There is no easy way to distinguish if a conntracked tcp packet is
marked invalid because of tcp_in_window() check error or because
it doesn't belong to an existing connection. With this patch,
openvswitch sets liberal tcp flag for the established sessions so
that out of window packets are not marked invalid.

A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
sets this flag for both the directions of the nf_conn.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 09:53:48 -08:00
Lev Stipakov
865e6ae02d net: openvswitch: use core API to update/provide stats
Commit d3fd65484c ("net: core: add dev_sw_netstats_tx_add") has added
function "dev_sw_netstats_tx_add()" to update net device per-cpu TX
stats.

Use this function instead of own code.

While on it, remove internal_get_stats() and replace it
with dev_get_tstats64().

Signed-off-by: Lev Stipakov <lev@openvpn.net>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20201113215336.145998-1-lev@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 16:59:32 -08:00
Jakub Kicinski
ae0d0bb29b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 17:33:38 -08:00
Eelco Chaudron
fea07a487c net: openvswitch: silence suspicious RCU usage warning
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
by replacing rcu_dereference() with rcu_dereference_ovsl().

In addition, when creating a new datapath, make sure it's configured under
the ovs_lock.

Fixes: 9bf24f594c ("net: openvswitch: make masks cache size configurable")
Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-03 16:57:42 -08:00
YueHaibing
92f9e238c9 openvswitch: Use IS_ERR instead of IS_ERR_OR_NULL
Fix smatch warning:

net/openvswitch/meter.c:427 ovs_meter_cmd_set() warn: passing zero to 'PTR_ERR'

dp_meter_create() never returns NULL, use IS_ERR
instead of IS_ERR_OR_NULL to fix this.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Link: https://lore.kernel.org/r/20201031060153.39912-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02 17:34:26 -08:00
Andrew Lunn
210bba6790 net: openvswitch: Fix kerneldoc warnings
net/openvswitch/flow.c:303: warning: Function parameter or member 'key_vh' not described in 'parse_vlan_tag'
net/openvswitch/flow.c:303: warning: Function parameter or member 'skb' not described in 'parse_vlan_tag'
net/openvswitch/flow.c:303: warning: Function parameter or member 'untag_vlan' not described in 'parse_vlan_tag'
net/openvswitch/vport.c:122: warning: Function parameter or member 'parms' not described in 'ovs_vport_alloc'

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201028004849.930094-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 11:39:43 -07:00
Eelco Chaudron
f981fc3d51 net: openvswitch: fix to make sure flow_lookup() is not preempted
The flow_lookup() function uses per CPU variables, which must be called
with BH disabled. However, this is fine in the general NAPI use case
where the local BH is disabled. But, it's also called from the netlink
context. The below patch makes sure that even in the netlink path, the
BH is disabled.

In addition, u64_stats_update_begin() requires a lock to ensure one writer
which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
ensures that there is always only one writer.

Fixes: eac87c413b ("net: openvswitch: reorder masks array based on usage")
Reported-by: Juri Lelli <jlelli@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-18 12:29:36 -07:00
Heiner Kallweit
3569939a81 net: openvswitch: use new function dev_fetch_sw_netstats
Simplify the code by using new function dev_fetch_sw_netstats().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/5e52dc91-97b1-82b0-214b-65d404e4a2ec@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13 17:33:49 -07:00
Jakub Kicinski
9d49aea13f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08 15:44:50 -07:00
Dumitru Ceara
8aa7b526dc openvswitch: handle DNAT tuple collision
With multiple DNAT rules it's possible that after destination
translation the resulting tuples collide.

For example, two openvswitch flows:
nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))

Assuming two TCP clients initiating the following connections:
10.0.0.10:5000->10.0.0.10:10
10.0.0.10:5000->10.0.0.20:10

Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
nf_conntrack_confirm() to fail because of tuple collision.

Netfilter handles this case by allocating a null binding for SNAT at
egress by default.  Perform the same operation in openvswitch for DNAT
if no explicit SNAT is requested by the user and allocate a null binding
for SNAT for packets in the "original" direction.

Reported-at: https://bugzilla.redhat.com/1877128
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 05752523e5 ("openvswitch: Interface with NAT.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08 12:20:35 -07:00
Fabian Frederick
e40b3727f9 net: openvswitch: use dev_sw_netstats_rx_add()
use new helper for netstats settings

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-06 06:23:21 -07:00
Rikard Falkeborn
b980b313e5 net: openvswitch: Constify static struct genl_small_ops
The only usage of these is to assign their address to the small_ops field
in the genl_family struct, which is a const pointer, and applying
ARRAY_SIZE() on them. Make them const to allow the compiler to put them
in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-04 21:13:36 -07:00
Guillaume Nault
19fbcb36a3 net/sched: act_vlan: Add {POP,PUSH}_ETH actions
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
respectively pop and push a base Ethernet header at the beginning of a
frame.

POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
must be stripped before calling POP_ETH.

PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
addresses can be configured. The Ethertype is automatically set from
skb->protocol. These restrictions ensure that all skb's fields remain
consistent, so that this action can't confuse other part of the
networking stack (like GSO).

Since openvswitch already had these actions, consolidate the code in
skbuff.c (like for vlan and mpls push/pop).

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03 17:28:45 -07:00
Jakub Kicinski
66a9b9287d genetlink: move to smaller ops wherever possible
Bulk of the genetlink users can use smaller ops, move them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 19:11:11 -07:00
Zeng Tao
7b066d173b net: openswitch: reuse the helper variable to improve the code readablity
In the function ovs_ct_limit_exit, there is already a helper vaibale
which could be reused to improve the readability, so i fix it in this
patch.

Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 14:24:08 -07:00
Jakub Kicinski
44a8c4f33c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
We got slightly different patches removing a double word
in a comment in net/ipv4/raw.c - picked the version from net.

Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
values instead of VNIC login response buffer (following what
commit 507ebe6444 ("ibmvnic: Fix use-after-free of VNIC login
response buffer") did).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-04 21:28:59 -07:00
Eelco Chaudron
e0afe91443 net: openvswitch: fixes crash if nf_conncount_init() fails
If nf_conncount_init fails currently the dispatched work is not canceled,
causing problems when the timer fires. This change fixes this by not
scheduling the work until all initialization is successful.

Fixes: a65878d6f0 ("net: openvswitch: fixes potential deadlock in dp cleanup code")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 13:23:23 -07:00
Tonghao Zhang
e6896163b2 net: openvswitch: remove unused keep_flows
keep_flows was introduced by [1], which used as flag to delete flows or not.
When rehashing or expanding the table instance, we will not flush the flows.
Now don't use it anymore, remove it.

[1] - acd051f176
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 11:42:15 -07:00
Tonghao Zhang
df68d64ee3 net: openvswitch: refactor flow free function
Decrease table->count and ufid_count unconditionally,
because we only don't use count or ufid_count to count
when flushing the flows. To simplify the codes, we
remove the "count" argument of table_instance_flow_free.

To avoid a bug when deleting flows in the future, add
WARN_ON in flush flows function.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 11:42:15 -07:00
Tonghao Zhang
cf3266ad48 net: openvswitch: improve the coding style
Not change the logic, just improve the coding style.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 11:42:15 -07:00
wenxu
a7c978c6c9 openvswitch: using ip6_fragment in ipv6_stub
Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31 12:26:39 -07:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Tonghao Zhang
1f3a090b90 net: openvswitch: introduce common code for flushing flows
To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

Fixes: 50b0e61b32 ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: Johan Knöös <jknoos@google.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-13 15:53:30 -07:00
Tonghao Zhang
5845589ed6 net: openvswitch: silence suspicious RCU usage warning
ovs_flow_tbl_destroy always is called from RCU callback
or error path. It is no need to check if rcu_read_lock
or lockdep_ovsl_is_held was held.

ovs_dp_cmd_fill_info always is called with ovs_mutex,
So use the rcu_dereference_ovsl instead of rcu_dereference
in ovs_flow_tbl_masks_cache_size.

Fixes: 9bf24f594c ("net: openvswitch: make masks cache size configurable")
Cc: Eelco Chaudron <echaudro@redhat.com>
Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:11:46 -07:00
Eelco Chaudron
9bf24f594c net: openvswitch: make masks cache size configurable
This patch makes the masks cache size configurable, or with
a size of 0, disable it.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:17:48 -07:00
Eelco Chaudron
9d2f627b7e net: openvswitch: add masks cache hit counter
Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:17:48 -07:00
Peilin Ye
9aba6c5b49 openvswitch: Prevent kernel-infoleak in ovs_ct_put_key()
ovs_ct_put_key() is potentially copying uninitialized kernel stack memory
into socket buffers, since the compiler may leave a 3-byte hole at the end
of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix
it by initializing `orig` with memset().

Fixes: 9dd7f8907c ("openvswitch: Add original direction conntrack tuple to sw_flow_key.")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:09:44 -07:00
wenxu
038ebb1a71 net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct
When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.

"kernel: net2: dropped over-mtu packet: 1528 > 1500"

This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.

Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:04:48 -07:00
Eelco Chaudron
a65878d6f0 net: openvswitch: fixes potential deadlock in dp cleanup code
The previous patch introduced a deadlock, this patch fixes it by making
sure the work is canceled without holding the global ovs lock. This is
done by moving the reorder processing one layer up to the netns level.

Fixes: eac87c413b ("net: openvswitch: reorder masks array based on usage")
Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
Reviewed-by: Paolo <pabeni@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 16:58:38 -07:00
Eelco Chaudron
eac87c413b net: openvswitch: reorder masks array based on usage
This patch reorders the masks array every 4 seconds based on their
usage count. This greatly reduces the masks per packet hit, and
hence the overall performance. Especially in the OVS/OVN case for
OpenShift.

Here are some results from the OVS/OVN OpenShift test, which use
8 pods, each pod having 512 uperf connections, each connection
sends a 64-byte request and gets a 1024-byte response (TCP).
All uperf clients are on 1 worker node while all uperf servers are
on the other worker node.

Kernel without this patch     :  7.71 Gbps
Kernel with this patch applied: 14.52 Gbps

We also run some tests to verify the rebalance activity does not
lower the flow insertion rate, which does not.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Andrew Theurer <atheurer@redhat.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 10:36:50 -07:00
Andrew Lunn
9667851423 net: openvswitch: kerneldoc fixes
Simple fixes which require no deep knowledge of the code.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:20:40 -07:00
Lorenzo Bianconi
1784365570 openvswitch: take into account de-fragmentation/gso_size in execute_check_pkt_len
ovs connection tracking module performs de-fragmentation on incoming
fragmented traffic. Take info account if traffic has been de-fragmented
in execute_check_pkt_len action otherwise we will perform the wrong
nested action considering the original packet size. This issue typically
occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
Moreover take into account GSO fragment size for GSO packet in
execute_check_pkt_len routine

Fixes: 4d5ec89fc8 ("net: openvswitch: Add a new action check_pkt_len")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 14:34:58 -07:00
Masahiro Yamada
a7f7f6248d treewide: replace '---help---' in Kconfig files with 'help'
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

  a) 4 spaces + '---help---'
  b) 7 spaces + '---help---'
  c) 8 spaces + '---help---'
  d) 1 space + 1 tab + '---help---'
  e) 1 tab + '---help---'    (correct indentation)
  f) 1 tab + 1 space + '---help---'
  g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

  $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-14 01:57:21 +09:00
Tonghao Zhang
659d4587fe net: openvswitch: use div_u64() for 64-by-32 divisions
Compile the kernel for arm 32 platform, the build warning found.
To fix that, should use div_u64() for divisions.
| net/openvswitch/meter.c:396: undefined reference to `__udivdi3'

[add more commit msg, change reported tag, and use div_u64 instead
of do_div by Tonghao]

Fixes: e57358873b ("net: openvswitch: use u64 for meter bucket")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:48:21 -07:00
Tonghao Zhang
4b36a0dff7 net: openvswitch: suitable access to the dp_meters
To fix the following sparse warning:
| net/openvswitch/meter.c:109:38: sparse: sparse: incorrect type
| in assignment (different address spaces) ...
| net/openvswitch/meter.c:720:45: sparse: sparse: incorrect type
| in argument 1 (different address spaces) ...

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:48:21 -07:00
David S. Miller
d483389678 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Simple overlapping changes to linux/vermagic.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:18:53 -07:00
Tonghao Zhang
e57358873b net: openvswitch: use u64 for meter bucket
When setting the meter rate to 4+Gbps, there is an
overflow, the meters don't work as expected.

Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 18:26:11 -07:00
Tonghao Zhang
c773500890 net: openvswitch: make EINVAL return value more obvious
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 18:26:11 -07:00
Tonghao Zhang
a8e387384f net: openvswitch: remove the unnecessary check
Before invoking the ovs_meter_cmd_reply_stats, "meter"
was checked, so don't check it agin in that function.

Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 18:26:11 -07:00
Tonghao Zhang
eb58eebc7f net: openvswitch: set max limitation to meters
Don't allow user to create meter unlimitedly, which may cause
to consume a large amount of kernel memory. The max number
supported is decided by physical memory and 20K meters as default.

Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 18:26:11 -07:00
Tonghao Zhang
c7c4c44c9a net: openvswitch: expand the meters supported number
In kernel datapath of Open vSwitch, there are only 1024
buckets of meter in one datapath. If installing more than
1024 (e.g. 8192) meters, it may lead to the performance drop.
But in some case, for example, Open vSwitch used as edge
gateway, there should be 20K at least, where meters used for
IP address bandwidth limitation.

[Open vSwitch userspace datapath has this issue too.]

For more scalable meter, this patch use meter array instead of
hash tables, and expand/shrink the array when necessary. So we
can install more meters than before in the datapath.
Introducing the struct *dp_meter_instance, it's easy to
expand meter though changing the *ti point in the struct
*dp_meter_table.

Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 18:26:11 -07:00
Tonghao Zhang
27de77cec9 net: openvswitch: ovs_ct_exit to be done under ovs_lock
syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread

To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb04 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20 10:53:54 -07:00
Tonghao Zhang
64948427a6 net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry
The struct sw_flow is protected by RCU, when traversing them,
use hlist_for_each_entry_rcu.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02 17:48:12 -07:00
Cambda Zhu
a08e7fd912 net: Fix typo of SKB_SGO_CB_OFFSET
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.

Fixes: 9207f9d45b ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:53:18 -07:00
David S. Miller
1d34357931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 22:34:48 -07:00
Jakub Kicinski
b5ab1f1be6 openvswitch: add missing attribute validation for hash
Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.

Fixes: bd1903b7c4 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03 13:28:48 -08:00
David S. Miller
e65ee2fb54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflict resolution of ice_virtchnl_pf.c based upon work by
Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21 13:39:34 -08:00
Kees Cook
16a556eeb7 openvswitch: Distribute switch variables for initialization
Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

net/openvswitch/flow_netlink.c: In function ‘validate_set’:
net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable]
 2711 |  const struct ovs_key_ipv4 *ipv4_key;
      |                             ^~~~~~~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20 10:00:19 -08:00
Madhuparna Bhowmik
a2cfb96cc3 flow_table.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Madhuparna Bhowmik
53742e69e8 datapath.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Madhuparna Bhowmik
fed48423f1 vport.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:27 -08:00
Madhuparna Bhowmik
7790614616 meter.c: Use built-in RCU list checking
hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18 12:46:26 -08:00
Matteo Croce
744676e777 openvswitch: add TTL decrement action
New action to decrement TTL instead of setting it to a fixed value.
This action will decrement the TTL and, in case of expired TTL, drop it
or execute an action passed via a nested attribute.
The default TTL expired action is to drop the packet.

Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.

Tested with a corresponding change in the userspace:

    # ovs-dpctl dump-flows
    in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1
    in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2
    in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
    in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1

    # ping -c1 192.168.0.2 -t 42
    IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
    # ping -c1 192.168.0.2 -t 120
    IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
    # ping -c1 192.168.0.2 -t 1
    #

Co-developed-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Bindiya Kurle <bindiyakurle@gmail.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16 19:34:44 -08:00
Jason A. Donenfeld
2cec4448db net: openvswitch: use skb_list_walk_safe helper for gso segments
This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 11:48:41 -08:00
Martin Varghese
f66b53fdbb openvswitch: New MPLS actions for layer 2 tunnelling
The existing PUSH MPLS action inserts MPLS header between ethernet header
and the IP header. Though this behaviour is fine for L3 VPN where an IP
packet is encapsulated inside a MPLS tunnel, it does not suffice the L2
VPN (l2 tunnelling) requirements. In L2 VPN the MPLS header should
encapsulate the ethernet packet.

The new mpls action ADD_MPLS inserts MPLS header at the start of the
packet or at the start of the l3 header depending on the value of l3 tunnel
flag in the ADD_MPLS arguments.

POP_MPLS action is extended to support ethertype 0x6558.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24 22:24:45 -08:00
Pankaj Bharadiya
c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Martin Varghese
d04ac224b1 net: Fixed updating of ethertype in skb_mpls_push()
The skb_mpls_push was not updating ethertype of an ethernet packet if
the packet was originally received from a non ARPHRD_ETHER device.

In the below OVS data path flow, since the device corresponding to
port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
not update the ethertype of the packet even though the previous
push_eth action had added an ethernet header to the packet.

recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4

Fixes: 8822e270d6 ("net: core: move push MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04 17:11:25 -08:00
Aaron Conole
5d50aa83e2 openvswitch: support asymmetric conntrack
The openvswitch module shares a common conntrack and NAT infrastructure
exposed via netfilter.  It's possible that a packet needs both SNAT and
DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
this because it runs through the NAT table twice - once on ingress and
again after egress.  The openvswitch module doesn't have such capability.

Like netfilter hook infrastructure, we should run through NAT twice to
keep the symmetry.

Fixes: 05752523e5 ("openvswitch: Interface with NAT.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04 16:31:15 -08:00
Martin Varghese
040b5cfbce Fixed updating of ethertype in function skb_mpls_pop
The skb_mpls_pop was not updating ethertype of an ethernet packet if the
packet was originally received from a non ARPHRD_ETHER device.

In the below OVS data path flow, since the device corresponding to port 7
is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
the ethertype of the packet even though the previous push_eth action had
added an ethernet header to the packet.

recirc_id(0),in_port(7),eth_type(0x8847),
mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
pop_mpls(eth_type=0x800),4

Fixes: ed246cee09 ("net: core: move pop MPLS functionality from OvS to core helper")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-02 13:03:50 -08:00
Paolo Abeni
8a574f8665 openvswitch: remove another BUG_ON()
If we can't build the flow del notification, we can simply delete
the flow, no need to crash the kernel. Still keep a WARN_ON to
preserve debuggability.

Note: the BUG_ON() predates the Fixes tag, but this change
can be applied only after the mentioned commit.

v1 -> v2:
 - do not leak an skb on error

Fixes: aed067783e ("openvswitch: Minimize ovs_flow_cmd_del critical section.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01 13:21:24 -08:00
Paolo Abeni
8ffeb03fbb openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
All the callers of ovs_flow_cmd_build_info() already deal with
error return code correctly, so we can handle the error condition
in a more gracefull way. Still dump a warning to preserve
debuggability.

v1 -> v2:
 - clarify the commit message
 - clean the skb and report the error (DaveM)

Fixes: ccb1352e76 ("net: Add Open vSwitch kernel components.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01 13:21:24 -08:00
Paolo Abeni
4e81c0b3fa openvswitch: fix flow command message size
When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant
flow has no UFID, we can exceed the computed size, as
ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY
attribute.
Take the above in account when computing the flow command message
size.

Fixes: 74ed7ab926 ("openvswitch: Add support for unique flow IDs.")
Reported-by: Qi Jun Ding <qding@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26 15:13:46 -08:00
Tonghao Zhang
61ca533c0e net: openvswitch: don't call pad_packet if not necessary
The nla_put_u16/nla_put_u32 makes sure that
*attrlen is align. The call tree is that:

nla_put_u16/nla_put_u32
  -> nla_put		attrlen = sizeof(u16) or sizeof(u32)
  -> __nla_put		attrlen
  -> __nla_reserve	attrlen
  -> skb_put(skb, nla_total_size(attrlen))

nla_total_size returns the total length of attribute
including padding.

Cc: Joe Stringer <joe@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15 12:43:27 -08:00
Tonghao Zhang
bd1903b7c4 net: openvswitch: add hash info to upcall
When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -> sk_set_txhash (is random)

__tcp_transmit_skb
    -> skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +--------------->     +-->  Open vSwitch ko     +-->
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 17:29:46 -08:00
Tonghao Zhang
90ce9f23a8 net: openvswitch: select vport upcall portid directly
The commit 69c51582ff786 ("dpif-netlink: don't allocate per
thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
changed the number of allocated sockets to just one per port
by moving the socket array from a per handler structure to
a per datapath one. In the kernel datapath, a vport will have
only one socket in most case, if so select it directly in
fast-path.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06 21:25:19 -08:00
Martin Varghese
fbdcdd78da Change in Openvswitch to support MPLS label depth of 3 in ingress direction
The openvswitch was supporting a MPLS label depth of 1 in the ingress
direction though the userspace OVS supports a max depth of 3 labels.
This change enables openvswitch module to support a max depth of
3 labels in the ingress.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05 18:02:29 -08:00
Tonghao Zhang
eec62eadd1 net: openvswitch: simplify the ovs_dp_cmd_new
use the specified functions to init resource.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00
Tonghao Zhang
4c76bf696a net: openvswitch: don't unlock mutex when changing the user_features fails
Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.

Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03 17:18:04 -08:00