Commit Graph

38900 Commits

Author SHA1 Message Date
Sara Sharon babc305e21 mac80211: reset CQM history upon reconfiguration
The current behavior of notifying CQM events is inconsistent:
Upon first configuration there is a cqm event with the current
status according to threshold configured, regardless of signal
stability.
When there is reconfiguration no event is sent unless there is
a significant change to the signal level according to the new
configuration.

Since the current reconfiguration behavior might cause missing
CQM events in case the current signal did not change but is on
the other side of the new threshold, fix that by resetting the
stored signal level upon reconfiguration.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22 15:22:50 +02:00
Johannes Berg 2df1b131b5 mac80211: fix VHT MCS mask array overrun
The HT MCS mask has 9 bytes, the VHT one only has 8 streams.
Split the loops to handle this correctly.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22 15:19:48 +02:00
Johannes Berg ef9be10c8c mac80211: reject software RSSI CQM with beacon filtering
When beacon filtering is enabled the mac80211 software implementation
for RSSI CQM cannot work as beacons will not be available. Rather than
accepting such a configuration without proper effect, reject it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-04 15:23:22 +02:00
Arik Nemtsov 52a45f38ca mac80211: avoid VHT usage with no 80MHz chans allowed
Currently if 80MHz channels are not allowed for use, the VHT IE is not
included in the probe request for an AP. This is not good enough if the
AP is configured with the wrong regulatory and supports VHT even where
prohibited or in TDLS scenarios.
Mark the ifmgd with the DISABLE_VHT flag for the misbehaving-AP case, and
unset VHT support from the peer-station entry for the TDLS case.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-04 14:31:41 +02:00
Maciej S. Szmigiero 549cc1c560 cfg80211: regulatory: restore proper user alpha2
restore_regulatory_settings() should restore alpha2
as computed in restore_alpha2(), not raw user_alpha2 to
behave as described in the comment just above that code.

This fixes endless loop of calling CRDA for "00" and "97"
countries after resume from suspend on my laptop.

Looks like others had the same problem, too:
http://ath9k-devel.ath9k.narkive.com/knY5W6St/ath9k-and-crda-messages-in-logs
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/899335
https://forum.porteus.org/viewtopic.php?t=4975&p=36436
https://forums.opensuse.org/showthread.php/483356-Authentication-Regulatory-Domain-issues-ath5k-12-2

Signed-off-by: Maciej Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-04 14:29:25 +02:00
João Paulo Rechi Vita 4c0778933a rfkill: Copy "all" global state to other types
When switching the state of all RFKill switches of type all we need to
replicate the RFKILL_TYPE_ALL global state to all the other types global
state, so it is used to initialize persistent RFKill switches on
register.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-04 14:26:56 +02:00
Avri Altman 22f66895e6 mac80211: protect non-HT BSS when HT TDLS traffic exists
HT TDLS traffic should be protected in a non-HT BSS to avoid
collisions. Therefore, when TDLS peers join/leave, check if
protection is (now) needed and set the ht_operation_mode of
the virtual interface according to the HT capabilities of the
TDLS peer(s).

This works because a non-HT BSS connection never sets (or
otherwise uses) the ht_operation_mode; it just means that
drivers must be aware that this field applies to all HT
traffic for this virtual interface, not just the traffic
within the BSS. Document that.

Signed-off-by: Avri Altman <avri.altman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-04 14:25:46 +02:00
Thierry Reding 98a1f8282b mac80211: Do not use sizeof() on pointer type
The rate_control_cap_mask() function takes a parameter mcs_mask, which
GCC will take to be u8 * even though it was declared with a fixed size.
This causes the following warning:

	net/mac80211/rate.c: In function 'rate_control_cap_mask':
	net/mac80211/rate.c:719:25: warning: 'sizeof' on array function parameter 'mcs_mask' will return size of 'u8 * {aka unsigned char *}' [-Wsizeof-array-argument]
	   for (i = 0; i < sizeof(mcs_mask); i++)
	                         ^
	net/mac80211/rate.c:684:10: note: declared here
	       u8 mcs_mask[IEEE80211_HT_MCS_MASK_LEN],
	          ^

This can be easily fixed by using the IEEE80211_HT_MCS_MASK_LEN directly
within the loop condition.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-04 14:23:08 +02:00
Linus Torvalds dd5cdb48ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Another merge window, another set of networking changes.  I've heard
  rumblings that the lightweight tunnels infrastructure has been voted
  networking change of the year.  But what do I know?

   1) Add conntrack support to openvswitch, from Joe Stringer.

   2) Initial support for VRF (Virtual Routing and Forwarding), which
      allows the segmentation of routing paths without using multiple
      devices.  There are some semantic kinks to work out still, but
      this is a reasonably strong foundation.  From David Ahern.

   3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

   4) Ignore route nexthops with a link down state in ipv6, just like
      ipv4.  From Andy Gospodarek.

   5) Remove spinlock from fast path of act_gact and act_mirred, from
      Eric Dumazet.

   6) Document the DSA layer, from Florian Fainelli.

   7) Add netconsole support to bcmgenet, systemport, and DSA.  Also
      from Florian Fainelli.

   8) Add Mellanox Switch Driver and core infrastructure, from Jiri
      Pirko.

   9) Add support for "light weight tunnels", which allow for
      encapsulation and decapsulation without bearing the overhead of a
      full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
      others.

  10) Add Identifier Locator Addressing support for ipv6, from Tom
      Herbert.

  11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

  12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

  13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

  14) Stop using a zero TX queue length to mean that a device shouldn't
      have a qdisc attached, use an explicit flag instead.  From Phil
      Sutter.

  15) Use generic geneve netdevice infrastructure in openvswitch, from
      Pravin B Shelar.

  16) Add infrastructure to avoid re-forwarding a packet in software
      that was already forwarded by a hardware switch.  From Scott
      Feldman.

  17) Allow AF_PACKET fanout function to be implemented in a bpf
      program, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
  netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
  netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
  net: fec: clear receive interrupts before processing a packet
  ipv6: fix exthdrs offload registration in out_rt path
  xen-netback: add support for multicast control
  bgmac: Update fixed_phy_register()
  sock, diag: fix panic in sock_diag_put_filterinfo
  flow_dissector: Use 'const' where possible.
  flow_dissector: Fix function argument ordering dependency
  ixgbe: Resolve "initialized field overwritten" warnings
  ixgbe: Remove bimodal SR-IOV disabling
  ixgbe: Add support for reporting 2.5G link speed
  ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
  ixgbe: support for ethtool set_rxfh
  ixgbe: Avoid needless PHY access on copper phys
  ixgbe: cleanup to use cached mask value
  ixgbe: Remove second instance of lan_id variable
  ixgbe: use kzalloc for allocating one thing
  flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
  ixgbe: Remove unused PCI bus types
  ...
2015-09-03 08:08:17 -07:00
Daniel Borkmann 62da98656b netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
Fengguang reported, that some randconfig generated the following linker
issue with nf_ct_zone_dflt object involved:

  [...]
  CC      init/version.o
  LD      init/built-in.o
  net/built-in.o: In function `ipv4_conntrack_defrag':
  nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
  net/built-in.o: In function `ipv6_defrag':
  nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
  make: *** [vmlinux] Error 1

Given that configurations exist where we have a built-in part, which is
accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
area when netfilter is configured in general.

Therefore, split the more generic parts into a common header under
include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
section that already holds parts related to CONFIG_NF_CONNTRACK in the
netfilter core. This fixes the issue on my side.

Fixes: 308ac9143e ("netfilter: nf_conntrack: push zone object into functions")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02 16:32:56 -07:00
Daniel Borkmann a82b0e6391 netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
While testing various Kconfig options on another issue, I found that
the following one triggers as well on allmodconfig and nf_conntrack
disabled:

  net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’:
  net/ipv4/netfilter/nf_dup_ipv4.c:72:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
    if (this_cpu_read(nf_skb_duplicated))
  [...]
  net/ipv6/netfilter/nf_dup_ipv6.c: In function ‘nf_dup_ipv6’:
  net/ipv6/netfilter/nf_dup_ipv6.c:66:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
    if (this_cpu_read(nf_skb_duplicated))

Fix it by including directly the header where it is defined.

Fixes: bbde9fc182 ("netfilter: factor out packet duplication for IPv4/IPv6")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02 16:28:06 -07:00
Daniel Borkmann e41b0bedba ipv6: fix exthdrs offload registration in out_rt path
We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.

Fixes: 3336288a9f ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02 15:31:00 -07:00
Daniel Borkmann b382c08656 sock, diag: fix panic in sock_diag_put_filterinfo
diag socket's sock_diag_put_filterinfo() dumps classic BPF programs
upon request to user space (ss -0 -b). However, native eBPF programs
attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method:

Their orig_prog is always NULL. However, sock_diag_put_filterinfo()
unconditionally tries to access its filter length resp. wants to copy
the filter insns from there. Internal cBPF to eBPF transformations
attached to sockets don't have this issue, as orig_prog state is kept.

It's currently only used by packet sockets. If we would want to add
native eBPF support in the future, this needs to be done through
a different attribute than PACKET_DIAG_FILTER to not confuse possible
user space disassemblers that work on diag data.

Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02 11:29:29 -07:00
David S. Miller 20a17bf6c0 flow_dissector: Use 'const' where possible.
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 21:19:17 -07:00
David S. Miller a17ace95b0 flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
These cannot live in net/core/flow.c which only builds when XFRM is
enabled.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 17:00:24 -07:00
David S. Miller 4b36993d3d flow_dissector: Don't use bit fields.
Just have a flags member instead.

   In file included from include/linux/linkage.h:4:0,
                    from include/linux/kernel.h:6,
                    from net/core/flow_dissector.c:1:
   In function 'flow_keys_hash_start',
       inlined from 'flow_hash_from_keys' at net/core/flow_dissector.c:553:34:
>> include/linux/compiler.h:447:38: error: call to '__compiletime_assert_459' declared with attribute error: BUILD_BUG_ON failed: FLOW_KEYS_HASH_OFFSET % sizeof(u32)

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 16:46:08 -07:00
Tom Herbert 6db61d79c1 flow_dissector: Ignore flow dissector return value from ___skb_get_hash
In ___skb_get_hash ignore return value from skb_flow_dissect_flow_keys.
A failure in that function likely means that there was a parse error,
so we may as well use whatever fields were found before the error was
hit.  This is also good because it means we won't keep trying to derive
the hash on subsequent calls to skb_get_hash for the same packet.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:23 -07:00
Tom Herbert 823b969395 flow_dissector: Add control/reporting of encapsulation
Add an input flag to flow dissector on rather dissection should stop
when encapsulation is detected (IP/IP or GRE). Also, add a key_control
flag that indicates encapsulation was encountered during the
dissection.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:23 -07:00
Tom Herbert 872b1abb1e flow_dissector: Add flag to stop parsing when an IPv6 flow label is seen
Add an input flag to flow dissector on rather dissection should be
stopped when a flow label is encountered. Presumably, the flow label
is derived from a sufficient hash of an inner transport packet so
further dissection is not needed (that is ports are not included in
the flow hash). Using the flow label instead of ports has the additional
benefit that packet fragments should hash to same value as non-fragments
for a flow (assuming that the same flow label is used).

We set this flag by default in for skb_get_hash.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:23 -07:00
Tom Herbert 8306b688f1 flow_dissector: Add flag to stop parsing at L3
Add an input flag to flow dissector on rather dissection should be
stopped when an L3 packet is encountered. This would be useful if a
caller just wanted to get IP addresses of the outermost header (e.g.
to do an L3 hash).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:23 -07:00
Tom Herbert b840f28b90 flow_dissector: Support IPv6 fragment header
Parse NEXTHDR_FRAGMENT. When seen account for it in the fragment bits of
key_control. Also, check if first fragment should be parsed.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
Tom Herbert 807e165dc4 flow_dissector: Add control/reporting of fragmentation
Add an input flag to flow dissector on rather dissection should be
attempted on a first fragment. Also add key_control flags to indicate
that a packet is a fragment or first fragment.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
Tom Herbert cd79a2382a flow_dissector: Add flags argument to skb_flow_dissector functions
The flags argument will allow control of the dissection process (for
instance whether to parse beyond L3).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
Tom Herbert a6e544b0a8 flow_dissector: Jump to exit code in __skb_flow_dissect
Instead of returning immediately (on a parsing failure for instance) we
jump to cleanup code. This always sets protocol values in key_control
(even on a failure there is still valid information in the key_tags that
was set before the problem was hit).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
Tom Herbert c6cc1ca7f4 flowi: Abstract out functions to get flow hash based on flowi
Create __get_hash_from_flowi6 and __get_hash_from_flowi4 to get the
flow keys and hash based on flowi structures. These are called by
__skb_get_hash_flowi6 and __skb_get_hash_flowi4. Also, created
get_hash_from_flowi6 and get_hash_from_flowi4 which can be called
when just the hash value for a flowi is needed.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
Tom Herbert bcc83839ff skbuff: Make __skb_set_sw_hash a general function
Move __skb_set_sw_hash to skbuff.h and add __skb_set_hash which is
a common method (between __skb_set_sw_hash and skb_set_hash) to set
the hash in an skbuff.

Also, move skb_clear_hash to be closer to __skb_set_hash.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
David Ahern 9b8ff51822 net: Make table id type u32
A number of VRF patches used 'int' for table id. It should be u32 to be
consistent with the rest of the stack.

Fixes:
4e3c89920c ("net: Introduce VRF related flags and helpers")
15be405eb2 ("net: Add inet_addr lookup by table")
30bbaa1950 ("net: Fix up inet_addr_type checks")
021dd3b8a1 ("net: Add routes to the table associated with the device")
dc028da54e ("inet: Move VRF table lookup to inlined function")
f6d3c19274 ("net: FIB tracepoints")

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 14:32:44 -07:00
Pravin B Shelar 63b6c13dbb tun_dst: Remove opts_size
opts_size is only written and never read. Following patch
removes this unused variable.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 21:23:42 -07:00
Marius Tomaschewski 2053aeb69a ipv6: send only one NEWLINK when RA causes changes
Signed-off-by: Marius Tomaschewski <mt@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 21:20:29 -07:00
Linus Torvalds d4c90396ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.3:

  API:

   - the AEAD interface transition is now complete.
   - add top-level skcipher interface.

  Drivers:

   - x86-64 acceleration for chacha20/poly1305.
   - add sunxi-ss Allwinner Security System crypto accelerator.
   - add RSA algorithm to qat driver.
   - add SRIOV support to qat driver.
   - add LS1021A support to caam.
   - add i.MX6 support to caam"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (163 commits)
  crypto: algif_aead - fix for multiple operations on AF_ALG sockets
  crypto: qat - enable legacy VFs
  MPI: Fix mpi_read_buffer
  crypto: qat - silence a static checker warning
  crypto: vmx - Fixing opcode issue
  crypto: caam - Use the preferred style for memory allocations
  crypto: caam - Propagate the real error code in caam_probe
  crypto: caam - Fix the error handling in caam_probe
  crypto: caam - fix writing to JQCR_MS when using service interface
  crypto: hash - Add AHASH_REQUEST_ON_STACK
  crypto: testmgr - Use new skcipher interface
  crypto: skcipher - Add top-level skcipher interface
  crypto: cmac - allow usage in FIPS mode
  crypto: sahara - Use dmam_alloc_coherent
  crypto: caam - Add support for LS1021A
  crypto: qat - Don't move data inside output buffer
  crypto: vmx - Fixing GHASH Key issue on little endian
  crypto: vmx - Fixing AES-CTR counter bug
  crypto: null - Add missing Kconfig tristate for NULL2
  crypto: nx - Add forward declaration for struct crypto_aead
  ...
2015-08-31 17:38:39 -07:00
Marius Tomaschewski a394eef562 ipv6: send NEWLINK on RA managed/otherconf changes
The kernel is applying the RA managed/otherconf flags silently and
forgets to send ifinfo notify to inform about their change when the
router provides a zero reachable_time and retrans_timer as dnsmasq
and many routers send it, which just means unspecified by this router
and the host should continue using whatever value it is already using.
Userspace may monitor the ifinfo notifications to activate dhcpv6.

Signed-off-by: Marius Tomaschewski <mt@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 15:13:08 -07:00
Andrew Lunn e448534668 net: dsa: Allow DSA and CPU ports to have a phy-mode property
It can be useful for DSA and CPU ports to have a phy-mode property, in
particular to specify RGMII delays. Parse the property and set it in
the fixed-link phydev.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 14:48:02 -07:00
Andrew Lunn 39b0c70519 net: dsa: Allow configuration of CPU & DSA port speeds/duplex
By default, DSA and CPU ports are configured to the maximum speed the
switch supports. However there can be use cases where the peer devices
port is slower. Allow a fixed-link property to be used with the DSA
and CPU port in the device tree, and use this information to configure
the port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 14:48:02 -07:00
Nikolay Aleksandrov 6ea3c9d5b0 mpls: fix mpls_net_init memory leak
Fix a memory leak in the mpls netns init function in case of failure. If
register_net_sysctl fails then we need to free the ctl_table.

Fixes: 7720c01f3f ("mpls: Add a sysctl to control the size of the mpls label table")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:45:09 -07:00
Daniel Borkmann c3a8d94746 tcp: use dctcp if enabled on the route to the initiator
Currently, the following case doesn't use DCTCP, even if it should:
A responder has f.e. Cubic as system wide default, but for a specific
route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
initiating host then uses DCTCP as congestion control, but since the
initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
and we have to fall back to Reno after 3WHS completes.

We were thinking on how to solve this in a minimal, non-intrusive
way without bloating tcp_ecn_create_request() needlessly: lets cache
the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
to only do a single metric feature lookup inside tcp_ecn_create_request().

Joint work with Florian Westphal.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Daniel Borkmann b8d3e4163a fib, fib6: reject invalid feature bits
Feature bits that are invalid should not be accepted by the kernel,
only the lower 4 bits may be configured, but not the remaining ones.
Even from these 4, 2 of them are unused.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Daniel Borkmann 1bb14807bc net: fib6: reduce identation in ip6_convert_metrics
Reduce the identation a bit, there's no need to artificically have
it increased.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Florian Westphal 6cf9dfd3bd net: fib: move metrics parsing to a helper
fib_create_info() is already quite large, so before adding more
code to the metrics section move that to a helper, similar to
ip6_convert_metrics.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:34:00 -07:00
Pravin B Shelar 4c22279848 ip-tunnel: Use API to access tunnel metadata options.
Currently tun-info options pointer is used in few cases to
pass options around. But tunnel options can be accessed using
ip_tunnel_info_opts() API without using the pointer. Following
patch removes the redundant pointer and consistently make use
of API.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:28:56 -07:00
Madalin Bucur d1bfc62591 ipv4: fix 32b build
Address remaining issue after 80ec192.

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 11:32:41 -07:00
David S. Miller 80ec1927b1 ipv4: Fix 32-bit build.
net/ipv4/af_inet.c: In function 'snmp_get_cpu_field64':
>> net/ipv4/af_inet.c:1486:26: error: 'offt' undeclared (first use in this function)
      v = *(((u64 *)bhptr) + offt);
                             ^
   net/ipv4/af_inet.c:1486:26: note: each undeclared identifier is reported only once for each function it appears in
   net/ipv4/af_inet.c: In function 'snmp_fold_field64':
>> net/ipv4/af_inet.c:1499:39: error: 'offct' undeclared (first use in this function)
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
                                          ^
>> net/ipv4/af_inet.c:1499:10: error: too many arguments to function 'snmp_get_cpu_field'
      res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
             ^
   net/ipv4/af_inet.c:1455:5: note: declared here
    u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
        ^

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 22:40:44 -07:00
Ken-ichirou MATSUZAWA 0ef707700f netlink: rx mmap: fix POLLIN condition
Poll() returns immediately after setting the kernel current frame
(ring->head) to SKIP from user space even though there is no new
frame. And in a case of all frames is VALID, user space program
unintensionally sets (only) kernel current frame to UNUSED, then
calls poll(), it will not return immediately even though there are
VALID frames.

To avoid situations like above, I think we need to scan all frames
to find VALID frames at poll() like netlink_alloc_skb(),
netlink_forward_ring() finding an UNUSED frame at skb allocation.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 21:55:51 -07:00
Raghavendra K T a3a773726c net: Optimize snmp stat aggregation by walking all the percpu data at once
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
                       Before           After
Docker creation time   6.836s           3.25s
cache miss             2.7%             1.41%

perf before:
    50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
     9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
     3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
     2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock

perf after:
    10.57%  docker           docker                [.] scanblock
     8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
     6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
     6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one

changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 21:48:58 -07:00
Raghavendra K T c4c6bc3146 net: Introduce helper functions to get the per cpu data
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-30 21:48:58 -07:00
David S. Miller 06fb4e701b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-08-30 21:45:01 -07:00
Pravin B Shelar a581b96dbf openvswitch: Remove vport-net
This structure is not used anymore.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Pravin B Shelar 8c876639c9 openvswitch: Remove vport stats.
Since all vport types are now backed by netdev, we can directly
use netdev stats. Following patch removes redundant stat
from vport.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Pravin B Shelar 3eedb41fb4 openvswitch: Remove egress_tun_info.
tun info is passed using skb-dst pointer. Now we have
converted all vports to netdev based implementation so
Now we can remove redundant pointer to tun-info from OVS_CB.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Pravin B Shelar 24d43f32d8 openvswitch: Remove vport get_name()
Remove unused get_name() function pointer from vport ops.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 19:07:15 -07:00
Simon Horman c30da49789 openvswitch: retain parsed IPv6 header fields in flow on error skipping extension headers
When an error occurs skipping IPv6 extension headers retain the already
parsed IP protocol and IPv6 addresses in the flow. Also assume that the
packet is not a fragment in the absence of information to the contrary;
that is always use the frag_off value set by ipv6_skip_exthdr().

This allows matching on the IP protocol and IPv6 addresses of packets
with malformed extension headers.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-29 13:39:59 -07:00