Commit graph

796826 commits

Author SHA1 Message Date
James Prestwood
99e3a44bac mac80211_hwsim: allow setting iftype support
The mac80211_hwsim driver hard codes its supported interface types. For
testing purposes it would be valuable to allow changing these supported
types in order to simulate actual drivers than support a limited set of
iftypes. A new attribute was added to allow this:

- HWSIM_ATTR_IFTYPE_SUPPORT
	A u32 bit field of supported NL80211_IFTYPE_* bits

This will only enable/disable iftypes that mac80211_hwsim already
supports.

In order to accomplish this, the ieee80211_iface_limit structure needed
to be built dynamically to only include limit rules for iftypes that
the user requested to enable.

Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
[fix some indentation, add netlink error string]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:33:40 +01:00
Johannes Berg
2f98abb17d mac80211_hwsim: move HWSIM_ATTR_RADIO_NAME parsing last
Avoid the need to kfree() the name in many places by moving
the name parsing last.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:32:39 +01:00
Andrew Zaborowski
3d1a5bbfaf nl80211: Emit a SET_INTERFACE on iftype change
Let userspace learn about iftype changes by sending a notification
when handling the NL80211_CMD_SET_INTERFACE command.  There seems
to be no other place where the iftype can change: nl80211_set_interface
is the only caller of cfg80211_change_iface which is the only caller of
ops->change_virtual_intf.

Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:22:10 +01:00
Martin Willi
c90b670b5c nl80211: announce radios/interfaces when switching namespaces
When a wiphy changes its namespace, all interfaces are moved to the
new namespace as well. The network interfaces are properly announced
as leaving on the old and as appearing on the new namespace through
RTM_NEWLINK/RTM_DELLINK. On nl80211, however, these events are missing
for radios and their interfaces.

Add netlink announcements through nl80211 when switching namespaces,
so userspace can rely on these events to discover radios properly.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:21:10 +01:00
Johannes Berg
cee7013be9 mac80211: allow drivers to use peer measurement API
There's nothing much for mac80211 to do, so only pass through
the requests with minimal checks and tracing. The driver must
call cfg80211's results APIs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:20:34 +01:00
Johannes Berg
9bb7e0f24e cfg80211: add peer measurement with FTM initiator API
Add a new "peer measurement" API, that can be used to measure
certain things related to a peer. Right now, only implement
FTM (flight time measurement) over it, but the idea is that
it'll be extensible to also support measuring the necessary
things to calculate e.g. angle-of-arrival for WiGig.

The API is structured to have a generic list of peers and
channels to measure with/on, and then for each of those a
set of measurements (again, only FTM right now) to perform.

Results are sent to the requesting socket, including a final
complete message.

Closing the controlling netlink socket will abort a running
measurement.

v3:
 - add a bit to report "final" for partial results
 - remove list keeping etc. and just unicast out the results
   to the requester (big code reduction ...)
 - also send complete message unicast, and as a result
   remove the multicast group
 - separate out struct cfg80211_pmsr_ftm_request_peer
   from struct cfg80211_pmsr_request_peer
 - document timeout == 0 if no timeout
 - disallow setting timeout nl80211 attribute to 0,
   must not include attribute for no timeout
 - make MAC address randomization optional
 - change num bursts exponent default to 0 (1 burst, rather
   rather than the old default of 15==don't care)

v4:
 - clarify NL80211_ATTR_TIMEOUT documentation

v5:
 - remove unnecessary nl80211 multicast/family changes
 - remove partial results bit/flag, final is sufficient
 - add max_bursts_exponent, max_ftms_per_burst to capability
 - rename "frames per burst" -> "FTMs per burst"

v6:
 - rename cfg80211_pmsr_free_wdev() to cfg80211_pmsr_wdev_down()
   and call it in leave, so the device can't go down with any
   pending measurements

v7:
 - wording fixes (Lior)
 - fix ftm.max_bursts_exponent to allow having the limit of 0 (Lior)

v8:
 - copyright statements
 - minor coding style fixes
 - fix error path leak

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:20:13 +01:00
Johannes Berg
801f87469e netlink: add nl_set_extack_cookie_u64()
Add a helper function nl_set_extack_cookie_u64() to use a u64 as
the netlink extended ACK cookie, to avoid having to open-code it
in any users of the cookie.

A u64 should be sufficient for most subsystems though we allow
for up to 20 bytes right now. This also matches the cookies in
nl80211 where I intend to use this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:20:07 +01:00
Johannes Berg
e0ba709543 mac80211: tx: avoid variable shadowing
We have a bool and an __le16 called qos, rename the inner __le16
to 'qoshdr' to make it more obvious and to avoid sparse warnings.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:17:25 +01:00
Johannes Berg
63c713e1e8 mac80211: debugfs: avoid variable shadowing
We have a macro here that uses an inner variable 'i' that
also exists in the outer scope - use '_i' in the macro.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:14:51 +01:00
Johannes Berg
6af8354f1d mac80211: sta_info: avoid tidstats variable shadowing
We have a pointer called 'tidstats' that shadows a bool function
argument with the same name, but we actually only use it once so
just remove the pointer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:13:15 +01:00
Johannes Berg
140d905b25 mac80211: tracing: avoid 'idx' variable
This variable shadows something that gets generated inside
the tracing macros, which causes sparse to warn. Avoid it
so sparse output is more readable, even if it doesn't seem
to cause any trouble.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:11:36 +01:00
Johannes Berg
aaaa10e01d cfg80211: tracing: avoid 'idx' variable
This variable shadows something that gets generated inside
the tracing macros, which causes sparse to warn. Avoid it
so sparse output is more readable, even if it doesn't seem
to cause any trouble.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-11-09 11:10:47 +01:00
Edward Cree
29e1220717 sfc: use the new __netdev_tx_sent_queue BQL optimisation
As added in 3e59020abf ("net: bql: add __netdev_tx_sent_queue()"), which
 see for performance rationale.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 20:01:29 -08:00
David S. Miller
eb4149c9a5 Merge branch 'net-Remove-VLAN_TAG_PRESENT-from-drivers'
Michał Mirosław says:

====================
net: Remove VLAN_TAG_PRESENT from drivers

This series removes VLAN_TAG_PRESENT use from network drivers in
preparation to removing its special meaning.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:49:32 -08:00
Michał Mirosław
f4f9a5e6cc gianfar: remove use of VLAN_TAG_PRESENT
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:49:32 -08:00
Michał Mirosław
9df46aefaf OVS: remove use of VLAN_TAG_PRESENT
This is a minimal change to allow removing of VLAN_TAG_PRESENT.
It leaves OVS unable to use CFI bit, as fixing this would need
a deeper surgery involving userspace interface.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:49:31 -08:00
Michał Mirosław
f723a1a293 cnic: remove use of VLAN_TAG_PRESENT
This just removes VLAN_TAG_PRESENT use.  VLAN TCI=0 special meaning is
deeply embedded in the driver code and so is left as is.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:49:31 -08:00
Michał Mirosław
1ef212afa4 i40iw: remove use of VLAN_TAG_PRESENT
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:49:31 -08:00
Ilias Apalodimas
0d404a6128 net: socionext: refactor netsec_alloc_dring()
return -ENOMEM directly instead of assigning it in a variable

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:42:41 -08:00
Ilias Apalodimas
4acb20b462 net: socionext: different approach on DMA
Current driver dynamically allocates an skb and maps it as DMA Rx
buffer. In order to prepare for upcoming XDP changes, let's introduce a
different allocation scheme.
Buffers are allocated dynamically and mapped into hardware.
During the Rx operation the driver uses build_skb() to produce the
necessary buffers for the network stack.
This change increases performance ~15% on 64b packets with smmu disabled
and ~5% with smmu enabled

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:42:41 -08:00
Stefan Wahren
026b907d58 net: qca_spi: Add available buffer space verification
Interferences on the SPI line could distort the response of
available buffer space. So at least we should check that the
response doesn't exceed the maximum available buffer space.
In error case increase a new error counter and retry it later.
This behavior avoids buffer errors in the QCA7000, which
results in an unnecessary chip reset including packet loss.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:41:01 -08:00
David Barmann
50254256f3 sock: Reset dst when changing sk_mark via setsockopt
When setting the SO_MARK socket option, if the mark changes, the dst
needs to be reset so that a new route lookup is performed.

This fixes the case where an application wants to change routing by
setting a new sk_mark.  If this is done after some packets have already
been sent, the dst is cached and has no effect.

Signed-off-by: David Barmann <david.barmann@stackpath.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 19:36:13 -08:00
David S. Miller
52358cb5a3 Merge branch 's390-qeth-next'
Julian Wiedmann says:

====================
s390/qeth: updates 2018-11-08

please apply the following qeth patches to net-next.

The first patch allows one more device type to query the FW for a MAC address,
the others are all basically just removal of duplicated or unused code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:24 -08:00
Julian Wiedmann
ded9da1fc2 s390/qeth: don't process hsuid in qeth_l3_setup_netdev()
qeth_l3_setup_netdev() checks if the hsuid attribute is set on the qeth
device, and propagates it to the net_device. In the past this was needed
to pick up any hsuid that was set before allocation of the net_device.

With commit d3d1b205e8 ("s390/qeth: allocate netdevice early") this
is no longer necessary, qeth_l3_dev_hsuid_store() always stores the
hsuid straight into dev->perm_addr.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:24 -08:00
Julian Wiedmann
9168f5ae38 s390/qeth: remove unused fallback in Layer3's MAC code
If the CREATE ADDR sent by qeth_l3_iqd_read_initial_mac() fails, its
callback sets a random MAC address on the net_device. The error then
propagates back, and qeth_l3_setup_netdev() bails out without
registering the net_device.

Any subsequent call to qeth_l3_setup_netdev() will then attempt a fresh
CREATE ADDR which either 1) also fails, or 2) sets a proper MAC address
on the net_device. Consequently, the net_device will never be registered
with a random MAC and we can drop the fallback code.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:24 -08:00
Julian Wiedmann
4fa55fa94f s390/qeth: remove two IPA command helpers
qeth_l3_send_ipa_arp_cmd() is merely a wrapper around
qeth_send_control_data() now. So push the length adjustment into
QETH_SETASS_BASE_LEN, and remove the wrapper. While at it, also remove
some redundant 0-initializations.

qeth_send_setassparms() requires that callers prepare their command
parameters, so that they can be copied into the parameter area in one
go. Skip the indirection, and just let callers set up the command
themselves.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:24 -08:00
Julian Wiedmann
605c9d5f58 s390/qeth: replace open-coded cmd setup
Call qeth_prepare_ipa_cmd() during setup of a new IPA cmd buffer, so
that it is used for all commands. Thus ARP and SNMP requests don't have
to do their own initialization.

This will now also set the proper MPC protocol version for SNMP requests
on L2 devices.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:23 -08:00
Julian Wiedmann
d7d18da1f7 s390/qeth: remove card list
Re-implement the card-by-RDEV lookup by using device model concepts, and
remove the now redundant list of all qeth card instances in the system.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:23 -08:00
Julian Wiedmann
81ec543939 s390/qeth: unify transmit code
Since commit 82bf5c0867 ("s390/qeth: add support for IPv6 TSO"),
qeth_xmit() also knows how to build TSO packets and is practically
identical to qeth_l3_xmit().
Convert qeth_l3_xmit() into a thin wrapper that merely strips the
L2 header off a packet, and calls qeth_xmit() for the actual
TX processing.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:23 -08:00
Julian Wiedmann
5a541f6d00 s390/qeth: handle af_iucv skbs in qeth_l3_fill_header()
Filling the HW header from one single function will make it easier to
rip out all the duplicated transmit code in qeth_l3_xmit(). On top, this
saves one conditional branch in the TSO path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:23 -08:00
Julian Wiedmann
b144b99fff s390/qeth: utilize virtual MAC for Layer2 OSD devices
By default, READ MAC on a Layer2 OSD device returns the adapter's
burnt-in MAC address. Given the default scenario of many virtual devices
on the same adapter, qeth can't make any use of this address and
therefore skips the READ MAC call for this device type.

But in some configurations, the READ MAC command for a Layer2 OSD device
actually returns a pre-provisioned, virtual MAC address. So enable the
READ MAC code to detect this situation, and let the L2 subdriver
call READ MAC for OSD devices.

This also removes the QETH_LAYER2_MAC_READ flag, which protects L2
devices against calling READ MAC multiple times. Instead protect the
whole call to qeth_l2_request_initial_mac().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:22:23 -08:00
Li RongQing
04087d9a89 openvswitch: remove BUG_ON from get_dpdev
if local is NULL pointer, and the following access of local's
dev will trigger panic, which is same as BUG_ON

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:14:59 -08:00
David S. Miller
20da4ef91c Merge branch 'ICMP-error-handling-for-UDP-tunnels'
Stefano Brivio says:

====================
ICMP error handling for UDP tunnels

This series introduces ICMP error handling for UDP tunnels and
encapsulations and related selftests. We need to handle ICMP errors to
support PMTU discovery and route redirection -- this support is entirely
missing right now:

- patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
  the same destination port on both endpoints -- i.e. VXLAN and GENEVE
- patches 2/11 to 7/11 are specific to VxLAN and GENEVE
- patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
  where sent packets cannot be matched via receiving socket lookup, i.e.
  FoU and GUE
- patches 10/11 and 11/11 are specific to FoU and GUE

v2: changes are listed in the single patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:09 -08:00
Stefano Brivio
56fd865f46 selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests
Introduce eight tests, for FoU and GUE, with IPv4 and IPv6 payload,
on IPv4 and IPv6 transport, that check that PMTU exceptions are created
with the right value when exceeding the MTU on a link of the path.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
b8a51b38e4 fou, fou6: ICMP error handlers for FoU and GUE
As the destination port in FoU and GUE receiving sockets doesn't
necessarily match the remote destination port, we can't associate errors
to the encapsulating tunnels with a socket lookup -- we need to blindly
try them instead. This means we don't even know if we are handling errors
for FoU or GUE without digging into the packets.

Hence, implement a single handler for both, one for IPv4 and one for IPv6,
that will check whether the packet that generated the ICMP error used a
direct IP encapsulation or if it had a GUE header, and send the error to
the matching protocol handler, if any.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
e7cc082455 udp: Support for error handlers of tunnels with arbitrary destination port
ICMP error handling is currently not possible for UDP tunnels not
employing a receiving socket with local destination port matching the
remote one, because we have no way to look them up.

Add an err_handler tunnel encapsulation operation that can be exported by
tunnels in order to pass the error to the protocol implementing the
encapsulation. We can't easily use a lookup function as we did for VXLAN
and GENEVE, as protocol error handlers, which would be in turn called by
implementations of this new operation, handle the errors themselves,
together with the tunnel lookup.

Without a socket, we can't be sure which encapsulation error handler is
the appropriate one: encapsulation handlers (the ones for FoU and GUE
introduced in the next patch, e.g.) will need to check the new error codes
returned by protocol handlers to figure out if errors match the given
encapsulation, and, in turn, report this error back, so that we can try
all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.

v2:
- Name all arguments in err_handler prototypes (David Miller)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
32bbd8793f net: Convert protocol error handlers from void to int
We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever handlers can't handle errors because, even if valid, they don't
match a protocol or any of its states.

This has no effect on existing error handling paths.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
ce7336610c selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6
Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
a025fb5f49 geneve: Allow configuration of DF behaviour
draft-ietf-nvo3-geneve-08 says:

   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
   [RFC1981]) be used by setting the DF bit in the IP header when Geneve
   packets are transmitted over IPv4 (this is the default with IPv6).

Now that ICMP error handling is working for GENEVE, we can comply with
this recommendation.

Make this configurable, though, to avoid breaking existing setups. By
default, DF won't be set. It can be set or inherited from inner IPv4
packets. If it's configured to be inherited and we are encapsulating IPv6,
it will be set.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, apply DF
  setting only if (!geneve->collect_md) in geneve_xmit_skb()
  (Stephen Hemminger)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
a07966447f geneve: ICMP error lookup handler
Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
582888792f selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6
Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Change all occurrences of VxLAN to VXLAN (Jiri Benc)
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
b4d3069783 vxlan: Allow configuration of DF behaviour
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.

For IPv4, inheriting DF from the inner header was probably intended from
the very beginning judging by the comment to vxlan_xmit(), but it wasn't
actually implemented -- also because it would have done more harm than
good, without handling for ICMP Fragmentation Needed messages.

According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
SHOULD set the DF-bit [...]", whatever that means.

Given this background, the only sane option is probably to let the user
decide, and keep the current behaviour as default.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, move DF
  setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
c3a43b9fec vxlan: ICMP error lookup handler
Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Stefano Brivio
a36e185e8c udp: Handle ICMP errors for tunnels with same destination port on both endpoints
For both IPv4 and IPv6, if we can't match errors to a socket, try
tunnels before ignoring them. Look up a socket with the original source
and destination ports as found in the UDP packet inside the ICMP payload,
this will work for tunnels that force the same destination port for both
endpoints, i.e. VXLAN and GENEVE.

Actually, lwtunnels could break this assumption if they are configured by
an external control plane to have different destination ports on the
endpoints: in this case, we won't be able to trace ICMP messages back to
them.

For IPv6 redirect messages, call ip6_redirect() directly with the output
interface argument set to the interface we received the packet from (as
it's the very interface we should build the exception on), otherwise the
new nexthop will be rejected. There's no such need for IPv4.

Tunnels can now export an encap_err_lookup() operation that indicates a
match. Pass the packet to the lookup function, and if the tunnel driver
reports a matching association, continue with regular ICMP error handling.

v2:
- Added newline between network and transport header sets in
  __udp{4,6}_lib_err_encap() (David Miller)
- Removed redundant skb_reset_network_header(skb); in
  __udp4_lib_err_encap()
- Removed redundant reassignment of iph in __udp4_lib_err_encap()
  (Sabrina Dubroca)
- Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this
  won't work with lwtunnels configured to use asymmetric ports. By the way,
  it's VXLAN, not VxLAN (Jiri Benc)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:13:08 -08:00
Colin Ian King
141b95d551 net: hns3: fix spelling mistake, "assertting" -> "asserting"
Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:07:56 -08:00
Ganesh Goudar
6d444c4efc cxgb4: Add new T6 PCI device ids 0x608a
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:05:20 -08:00
Li RongQing
1c51dc9ad6 net/ipv6: compute anycast address hash only if dev is null
avoid to compute the hash value if dev is not null, since
hash value is not used

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 17:04:43 -08:00
YueHaibing
0db55093b5 net: bcmgenet: return correct value 'ret' from bcmgenet_power_down
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/broadcom/genet/bcmgenet.c: In function 'bcmgenet_power_down':
drivers/net/ethernet/broadcom/genet/bcmgenet.c:1136:6: warning:
 variable 'ret' set but not used [-Wunused-but-set-variable]

bcmgenet_power_down should return 'ret' instead of 0.

Fixes: ca8cf34190 ("net: bcmgenet: propagate errors from bcmgenet_power_down")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:21:41 -08:00
David S. Miller
3ed3857011 Merge branch 'net-sched-prepare-for-more-Qdisc-offloads'
Jakub Kicinski says:

====================
net: sched: prepare for more Qdisc offloads

This series refactors the "switchdev" Qdisc offloads a little.  We have
a few Qdiscs which can be fully offloaded today to the forwarding plane
of switching devices.

First patch adds a helper for handing statistic dumps, the code seems
to be copy pasted between PRIO and RED.  Second patch removes unnecessary
parameter from RED offload function.  Third patch makes the MQ offload
use the dump helper which helps it behave much like PRIO and RED when
it comes to the TCQ_F_OFFLOADED flag.  Patch 4 adds a graft helper,
similar to the dump helper.

Patch 5 is unrelated to offloads, qdisc_graft() code seemed ripe for a
small refactor - no functional changes there.

Last two patches move the qdisc_put() call outside of the sch_tree_lock
section for RED and PRIO.  The child Qdiscs will get removed from the
hierarchy under the lock, but having the put (and potentially destroy)
called outside of the lock helps offload which may choose to sleep,
and it should generally lower the Qdisc change impact.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00
Jakub Kicinski
7b8e0b6e65 net: sched: prio: delay destroying child qdiscs on change
Move destroying of the old child qdiscs outside of the sch_tree_lock()
section.  This should improve the software qdisc replace but is even
more important for offloads.  Calling offloads under a spin lock is
best avoided, and child's destroy would be called under sch_tree_lock().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 16:19:48 -08:00