Commit graph

53461 commits

Author SHA1 Message Date
Jianfeng Tan
9d2f67e43b net/packet: fix packet drop as of virtio gso
When we use raw socket as the vhost backend, a packet from virito with
gso offloading information, cannot be sent out in later validaton at
xmit path, as we did not set correct skb->protocol which is further used
for looking up the gso function.

To fix this, we set this field according to virito hdr information.

Fixes: e858fae2b0 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 22:23:15 -07:00
David Ahern
1620a33695 net: Move free of dst_metrics to helper
Move the refcounting and potential free of dst metrics associated
for ipv4 and ipv6 to a common helper.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 21:54:25 -07:00
David Ahern
e1255ed4b6 net: common metrics init helper for dst_entry
ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
Move the common initialization code to a helper and use for both protocols.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 21:54:19 -07:00
David Ahern
cc5f0eb216 net: Move free of fib_metrics to helper
Move the refcounting and potential free of dst metrics associated
with a fib entry to a helper and use it in both ipv4 and ipv6.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 21:54:10 -07:00
David Ahern
767a221753 net: common metrics init helper for FIB entries
Consolidate initialization of ipv4 and ipv6 metrics when fib entries
are created into a single helper, ip_fib_metrics_init, that handles
the call to ip_metrics_convert.

If no metrics are defined for the fib entry, then the metrics is set
to dst_default_metrics.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 21:54:03 -07:00
Flavio Leitner
17c357efe5 openvswitch: load NAT helper
Load the respective NAT helper module if the flow uses it.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 21:45:16 -07:00
Vinicius Costa Gomes
5a781ccbd1 tc: Add support for configuring the taprio scheduler
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.

Example configuration:

tc qdisc replace dev enp3s0 parent root handle 100 taprio \
          num_tc 3 \
	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	  queues 1@0 1@1 2@2 \
	  base-time 1528743495910289987 \
	  sched-entry S 01 300000 \
	  sched-entry S 02 300000 \
	  sched-entry S 04 300000 \
	  clockid CLOCK_TAI

The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:

     sched-entry <CMD> <GATE MASK> <INTERVAL>

The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:

 - base-time: specifies the instant when the schedule starts, if
  'base-time' is a time in the past, the schedule will start at

 	      base-time + (N * cycle-time)

   where N is the smallest integer so the resulting time is greater
   than "now", and "cycle-time" is the sum of all the intervals of the
   entries in the schedule;

 - clockid: specifies the reference clock to be used;

The parameters should be similar to what the IEEE 802.1Q family of
specification defines.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:52:23 -07:00
Vasundhara Volam
16511789b9 devlink: Add generic parameter msix_vec_per_pf_min
msix_vec_per_pf_min - This param sets the number of minimal MSIX
vectors required for the device initialization. This value is set
in the device which limits MSIX vectors per PF.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:49:42 -07:00
Vasundhara Volam
f61cba4291 devlink: Add generic parameter msix_vec_per_pf_max
msix_vec_per_pf_max - This param sets the number of MSIX vectors
that the device requests from the host on driver initialization.
This value is set in the device which is applicable per PF.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:49:42 -07:00
Vasundhara Volam
e3b5106162 devlink: Add generic parameter ignore_ari
ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
even when platforms has the support and creates same number of
partitions when platform does not support ARI capability.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:49:42 -07:00
David S. Miller
9e15ff7b89 Just three small fixes:
* fix use-after-free in regulatory code
  * fix rx-mgmt key flag in AP mode (mac80211)
  * fix wireless extensions compat code memory leak
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlu2bEUACgkQB8qZga/f
 l8RAgw/7BfRpm3Kr7XW919naGkt/pQeJxUcuF9YggBpTCrp/DSLQYsjOBE5DyS/m
 728oPD8jEDehUHasWKsbG7wit1S7ImExCHTPim8C1mbABhbqdhwD4ceUvBO7RYi2
 p0+yN8X8z5D0qruMrNwhtxdE8iV9bBgmY6u1jubpJFkKLPf2euZyroH40b879CIn
 aHqB42GNJCdwO2UFaPDH2cdx5DFWrDlfA1LGbrbuzrXMBfNGWYgen2JJH/5iDOyU
 1rVXk/pUpVffp0Zde+66NtyCxxC0+hQwrTczEKXICb5qoWJpz6kugFGGO1oDQgdp
 AbM7KNrV712h/qwTEnC1NG0KUXgocpwWIuf/cuTow0vGUJSl+O2pLS/3GLOwH2du
 1u/FF4LiBc4NFXmWBPMN3LUN+Ica0/YWSbVwcv2c4guemV1EOGinlbFc+tnoue7M
 fpLkQJUYCiEVFRXGWVaSl0Hr6z+zwgfa8qHYN2yq1qyB0dYHryYiVhgKLV7yisCm
 RNy1hmVuV7rMsL3f4iUq/2xnL3U8qK1+19Mr+i58/kU4Tx2jMkWj0kivRdgYX4EN
 XcBhJzWXrb3yMldACrCji6iRnrFMqg7osyEgFiMLMl4cZfs057+qsrJyR5xajVOi
 Ws+hj3a1LukJ1nIhou4uOLAt9D7ohuJojQq6D2GLeBwfGPNkz4E=
 =QyWK
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just three small fixes:
 * fix use-after-free in regulatory code
 * fix rx-mgmt key flag in AP mode (mac80211)
 * fix wireless extensions compat code memory leak
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 13:47:39 -07:00
David S. Miller
9e50727f0e mlx5-updates-2018-10-03
mlx5 core driver and ethernet netdev updates, please note there is a small
 devlink releated update to allow extack argument to eswitch operations.
 
 From Eli Britstein,
 1) devlink: Add extack argument to the eswitch related operations
 2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
 3) net/mlx5e: Add extack messages for TC offload failures
 
 From Eran Ben Elisha,
 4) mlx5e: Add counter for aRFS rule insertion failures
 
 From Feras Daoud
 5) Fast teardown support for mlx5 device
 This change introduces the enhanced version of the "Force teardown" that
 allows SW to perform teardown in a faster way without the need to reclaim
 all the FW pages.
 Fast teardown provides the following advantages:
     1- Fix a FW race condition that could cause command timeout
     2- Avoid moving to polling mode
     3- Close the vport to prevent PCI ACK to be sent without been scatter
     to memory
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbtU45AAoJEEg/ir3gV/o+/C4H/RHA4KImrb476EdB3VNYMqAN
 dgXb+bmh6sZP+jHWqQ4c3aVeh6/T8qm4gwiSn2nVTtHEnxtCdIYljzDC1Nswczeg
 pSjD1eOP7M1LpAOmBb8xdnJcX7yM7r1bTklnp2sN853WShbsDRYgZBHsBwTzx25U
 ZdzL4QTLuohlG/aLrbGXMntIy45ya2fVQrnK54s18nFlgsdFjEs0mi0xaUKNBC6+
 P8CTohHAxuuxmL5b+6MIYLZCdgd8cLNQFdtqbckEVw7SvcRTxfraRlyqJ0YOgTGB
 TdSWnqZz2JYH29wSFbpFG8qX6GCv8FoiZ+fKzldbolHk442rrktHv3+Y7qQuZVs=
 =NVks
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-10-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-10-03

mlx5 core driver and ethernet netdev updates, please note there is a small
devlink releated update to allow extack argument to eswitch operations.

From Eli Britstein,
1) devlink: Add extack argument to the eswitch related operations
2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
3) net/mlx5e: Add extack messages for TC offload failures

From Eran Ben Elisha,
4) mlx5e: Add counter for aRFS rule insertion failures

From Feras Daoud
5) Fast teardown support for mlx5 device
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the FW pages.
Fast teardown provides the following advantages:
    1- Fix a FW race condition that could cause command timeout
    2- Avoid moving to polling mode
    3- Close the vport to prevent PCI ACK to be sent without been scatter
    to memory
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 09:48:37 -07:00
David S. Miller
f0e834e17f RxRPC development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAW7XYNfu3V2unywtrAQJCjhAAle39wPhGLcvO69dEwL3skLT74d2XTT+Q
 0e7+5eUbyNFfW+hEphZBCtTCTOFYhzxoqATZHVpEquOZOt2B5b43W97LitHvpirf
 rHAOT6Y6P6t682/f7oHiXx4tW9hfEpgXyFj1aqluAcK7lOvBQf/koy4t1DAMfiTv
 dmc0A8OwBxfKfwSmAmRY26mpuLTzGe+gZkMzMFScicJ/GmWSIw8ujfajjIFv7c0F
 kYub9KkTCsc2LfI/2KFIZVgyI0acDlwMKVmZirzlZJWTUSaf71v6ApFVNbUOSKBm
 64lm6h5jr5//x3rK6DdI6MV943uCwl5uFKbnTVu/Wv+bjvCuBviIvXuYWCHm6QMl
 uqt7Ht3qIid0VKBJc1ekrEvfDo7xi61GWV07BknRpMbUjlqadST7td7wDfuK017C
 OUzKdJFNTzNwAWzkJIgRpyyRjUSm3pTxEL7iTZ0+shENpKpT+SyzKgseNPUCHAPv
 8kprzGD3pXNjbpgZOeLcbpGbhnFTrV6NKguIAKy/NBMVlnx4yzbNTCjEzwLx/oG4
 G0KlPUzXR/ZO3CtjDGz0k/pjJOAHmLxWfaaAk+opks8mFDaKKGHAg1rzkM5KN6kk
 GGsO0vWFIRfbFqqWhojFxEtiNgrtHIStVJSZtnBG0WRdn8jdrYAnFBo9leGnS+5/
 y6PM91VI2+8=
 =b6yx
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-next-20181004' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Development

Here are some development patches for AF_RXRPC.  The most significant points
are:

 (1) Change the tracepoint that indicates a packet has been transmitted
     into one that indicates a packet is about to be transmitted.  Without
     this, the response tracepoint may occur first if the round trip is
     fast enough.

 (2) Sort out AFS address list handling to better enforce maximum capacity
     to use helper functions to fill them and to do an insertion sort to
     order them.  This is here to make (3) easier.

 (3) Keep AF_INET addresses as AF_INET addresses rather than converting
     them to AF_INET6 in both AF_RXRPC and kAFS.  I hadn't realised that a
     UDP6 socket would just call down into UDP4 if given an AF_INET
     address.

 (4) Allow the timestamp on the first DATA packet of a reply to be
     retrieved by a kernel service.  This will give the kAFS a more
     accurate base from which to calculate the callback promise expiration.

 (5) Allow the rxrpc protocol epoch value to be retrieved from an incoming
     call.  This will allow kAFS to determine if the fileserver restarted
     and if two addresses apparently assigned to the same fileserver
     actually are different boxes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 09:44:15 -07:00
David Howells
bbb4c4323a dns: Allow the dns resolver to retrieve a server set
Allow the DNS resolver to retrieve a set of servers and their associated
addresses, ports, preference and weight ratings.

In terms of communication with userspace, "srv=1" is added to the callout
string (the '1' indicating the maximum data version supported by the
kernel) to ask the userspace side for this.

If the userspace side doesn't recognise it, it will ignore the option and
return the usual text address list.

If the userspace side does recognise it, it will return some binary data
that begins with a zero byte that would cause the string parsers to give an
error.  The second byte contains the version of the data in the blob (this
may be between 1 and the version specified in the callout data).  The
remainder of the payload is version-specific.

In version 1, the payload looks like (note that this is packed):

	u8	Non-string marker (ie. 0)
	u8	Content (0 => Server list)
	u8	Version (ie. 1)
	u8	Source (eg. DNS_RECORD_FROM_DNS_SRV)
	u8	Status (eg. DNS_LOOKUP_GOOD)
	u8	Number of servers
	foreach-server {
		u16	Name length (LE)
		u16	Priority (as per SRV record) (LE)
		u16	Weight (as per SRV record) (LE)
		u16	Port (LE)
		u8	Source (eg. DNS_RECORD_FROM_NSS)
		u8	Status (eg. DNS_LOOKUP_GOT_NOT_FOUND)
		u8	Protocol (eg. DNS_SERVER_PROTOCOL_UDP)
		u8	Number of addresses
		char[]	Name (not NUL-terminated)
		foreach-address {
			u8		Family (AF_INET{,6})
			union {
				u8[4]	ipv4_addr
				u8[16]	ipv6_addr
			}
		}
	}

This can then be used to fetch a whole cell's VL-server configuration for
AFS, for example.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 09:40:52 -07:00
David Howells
e908bcf4f1 rxrpc: Allow the reply time to be obtained on a client call
Allow the epoch value to be queried on a server connection.  This is in the
rxrpc header of every packet for use in routing and is derived from the
client's state.  It's also not supposed to change unless the client gets
restarted.

AFS can make use of this information to deduce whether a fileserver has
been restarted because the fileserver makes client calls to the filesystem
driver's cache manager to send notifications (ie. callback breaks) about
conflicting changes from other clients.  These convey the fileserver's own
epoch value back to the filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:54:29 +01:00
David Howells
2070a3e449 rxrpc: Allow the reply time to be obtained on a client call
Allow the timestamp on the sk_buff holding the first DATA packet of a reply
to be queried.  This can then be used as a base for the expiry time
calculation on the callback promise duration indicated by an operation
result.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:42:29 +01:00
Joe Stringer
d71019b54b net: core: Fix build with CONFIG_IPV6=m
Stephen Rothwell reports the following link failure with IPv6 as module:

  x86_64-linux-gnu-ld: net/core/filter.o: in function `sk_lookup':
  (.text+0x19219): undefined reference to `__udp6_lib_lookup'

Fix the build by only enabling the IPv6 socket lookup if IPv6 support is
compiled into the kernel.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-04 10:32:56 +02:00
David Howells
5a790b7375 rxrpc: Drop the local endpoint arg from rxrpc_extract_addr_from_skb()
rxrpc_extract_addr_from_skb() doesn't use the argument that points to the
local endpoint, so remove the argument.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:32:28 +01:00
David Howells
46894a1359 rxrpc: Use IPv4 addresses throught the IPv6
AF_RXRPC opens an IPv6 socket through which to send and receive network
packets, both IPv6 and IPv4.  It currently turns AF_INET addresses into
AF_INET-as-AF_INET6 addresses based on an assumption that this was
necessary; on further inspection of the code, however, it turns out that
the IPv6 code just farms packets aimed at AF_INET addresses out to the IPv4
code.

Fix AF_RXRPC to use AF_INET addresses directly when given them.

Fixes: 7b674e390e ("rxrpc: Fix IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:32:28 +01:00
David Howells
b3cfb6f567 rxrpc: Emit the data Tx trace line before transmitting
Print the data Tx trace line before transmitting so that it appears before
the trace lines indicating success or failure of the transmission.  This
makes the trace log less confusing.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:32:27 +01:00
David Howells
d2944b1c66 rxrpc: Use rxrpc_free_skb() rather than rxrpc_lose_skb()
rxrpc_lose_skb() is now exactly the same as rxrpc_free_skb(), so remove it
and use the latter instead.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:32:27 +01:00
David S. Miller
6f41617bf2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03 21:00:17 -07:00
Eli Britstein
db7ff19e7b devlink: Add extack for eswitch operations
Add extack argument to the eswitch related operations.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-03 16:17:58 -07:00
Gustavo A. R. Silva
2cc543f5cd sctp: fix fall-through annotation
Replace "fallthru" with a proper "fall through" annotation.

This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03 09:33:13 -07:00
Chenbo Feng
e9837e55b0 netfilter: xt_quota: fix the behavior of xt_quota module
A major flaw of the current xt_quota module is that quota in a specific
rule gets reset every time there is a rule change in the same table. It
makes the xt_quota module not very useful in a table in which iptables
rules are changed at run time. This fix introduces a new counter that is
visible to userspace as the remaining quota of the current rule. When
userspace restores the rules in a table, it can restore the counter to
the remaining quota instead of resetting it to the full quota.

Signed-off-by: Chenbo Feng <fengc@google.com>
Suggested-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-03 11:32:54 +02:00
Vakul Garg
4e6d47206c tls: Add support for inplace records encryption
Presently, for non-zero copy case, separate pages are allocated for
storing plaintext and encrypted text of records. These pages are stored
in sg_plaintext_data and sg_encrypted_data scatterlists inside record
structure. Further, sg_plaintext_data & sg_encrypted_data are passed
to cryptoapis for record encryption. Allocating separate pages for
plaintext and encrypted text is inefficient from both required memory
and performance point of view.

This patch adds support of inplace encryption of records. For non-zero
copy case, we reuse the pages from sg_encrypted_data scatterlist to
copy the application's plaintext data. For the movement of pages from
sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new
function move_to_plaintext_sg(). This function add pages into
sg_plaintext_data from sg_encrypted_data scatterlists.

tls_do_encryption() is modified to pass the same scatterlist as both
source and destination into aead_request_set_crypt() if inplace crypto
has been enabled. A new ariable 'inplace_crypto' has been introduced in
record structure to signify whether the same scatterlist can be used.
By default, the inplace_crypto is enabled in get_rec(). If zero-copy is
used (i.e. plaintext data is not copied), inplace_crypto is set to '0'.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 23:03:47 -07:00
David S. Miller
00538ba915 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2018-09-30

Here's the first bluetooth-next pull request for the 4.20 kernel.

 - Fixes & cleanups to hci_qca driver
 - NULL dereference fix to debugfs
 - Improved L2CAP Connection-oriented Channel MTU & MPS handling
 - Added support for USB-based RTL8822C controller
 - Added device ID for BCM4335C0 UART-based controller
 - Various other smaller cleanups & fixes

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 22:34:54 -07:00
Eric Dumazet
64199fc0a4 ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()
Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy,
do not do it.

Fixes: 2efd4fca70 ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 22:32:05 -07:00
Patrick Ruddy
e4a38c0c4b ipv6: add vrf table handling code for ipv6 mcast
The code to obtain the correct table for the incoming interface was
missing for IPv6. This has been added along with the table creation
notification to fib rules for the RTNL_FAMILY_IP6MR address family.

Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 22:29:08 -07:00
Robert Shearman
854da99173 ipv4: Allow sending multicast packets on specific i/f using VRF socket
It is useful to be able to use the same socket for listening in a
specific VRF, as for sending multicast packets out of a specific
interface. However, the bound device on the socket currently takes
precedence and results in the packets not being sent.

Relax the condition on overriding the output interface to use for
sending packets out of UDP, raw and ping sockets to allow multicast
packets to be sent using the specified multicast interface.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 22:28:17 -07:00
Ido Schimmel
6919622af3 bridge: mcast: Default back to multicast enabled state
Commit 13cefad2f2 ("net: bridge: convert and rename mcast disabled")
converted the 'multicast_disabled' field to an option bit named
'BROPT_MULTICAST_ENABLED'.

While the old field was implicitly initialized to 0, the new field is
not initialized, resulting in the bridge defaulting to multicast
disabled state and breaking existing applications.

Fix this by explicitly initializing the option.

Fixes: 13cefad2f2 ("net: bridge: convert and rename mcast disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 22:27:36 -07:00
Eric Dumazet
8873c064d1 tcp: do not release socket ownership in tcp_close()
syzkaller was able to hit the WARN_ON(sock_owned_by_user(sk));
in tcp_close()

While a socket is being closed, it is very possible other
threads find it in rtnetlink dump.

tcp_get_info() will acquire the socket lock for a short amount
of time (slow = lock_sock_fast(sk)/unlock_sock_fast(sk, slow);),
enough to trigger the warning.

Fixes: 67db3e4bfb ("tcp: no longer hold ehash lock while calling tcp_get_info()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 22:17:35 -07:00
Joe Stringer
6acc9b432e bpf: Add helper to retrieve socket in BPF
This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and
bpf_sk_lookup_udp() which allows BPF programs to find out if there is a
socket listening on this host, and returns a socket pointer which the
BPF program can then access to determine, for instance, whether to
forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the
socket, so when a BPF program makes use of this function, it must
subsequently pass the returned pointer into the newly added sk_release()
to return the reference.

By way of example, the following pseudocode would filter inbound
connections at XDP if there is no corresponding service listening for
the traffic:

  struct bpf_sock_tuple tuple;
  struct bpf_sock_ops *sk;

  populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
  sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0);
  if (!sk) {
    // Couldn't find a socket listening for this traffic. Drop.
    return TC_ACT_SHOT;
  }
  bpf_sk_release(sk, 0);
  return TC_ACT_OK;

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-03 02:53:47 +02:00
Joe Stringer
c64b798328 bpf: Add PTR_TO_SOCKET verifier type
Teach the verifier a little bit about a new type of pointer, a
PTR_TO_SOCKET. This pointer type is accessed from BPF through the
'struct bpf_sock' structure.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-03 02:53:47 +02:00
Maciej Żenczykowski
744486d426 net: inet6_rtm_getroute() - use new style struct initializer instead of memset
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:40 -07:00
Maciej Żenczykowski
84db840715 net: rtm_to_fib6_config() - use new style struct initializer instead of memset
(allows for better compiler optimization)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:40 -07:00
Maciej Żenczykowski
8823a3acfd net: rtmsg_to_fib6_config() - use new style struct initializer instead of memset
(allows for better compiler optimization)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:40 -07:00
Maciej Żenczykowski
dc92095dd9 net: ip6_update_pmtu() - use new style struct initializer instead of memset
(allows for better compiler optimization)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:40 -07:00
Maciej Żenczykowski
d456336d16 net: remove 1 always zero parameter from ip6_redirect_no_header()
(the parameter in question is mark)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:40 -07:00
Maciej Żenczykowski
0b26fb17ca net: ip6_redirect_no_header() - use new style struct initializer instead of memset
(allows for better compiler optimization)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:40 -07:00
Maciej Żenczykowski
1f7f10ac4a net: ip6_redirect() - use new style struct initializer instead of memset
(allows for better compiler optimization)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:39 -07:00
Maciej Żenczykowski
e8e3fbe92c net: inet_rtm_getroute() - use new style struct initializer instead of memset
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:39 -07:00
Maciej Żenczykowski
e351bb6227 net: ip_rt_get_source() - use new style struct initializer instead of memset
(allows for better compiler optimization)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:12:39 -07:00
Eric Dumazet
0e1d6eca51 rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096
We have an impressive number of syzkaller bugs that are linked
to the fact that syzbot was able to create a networking device
with millions of TX (or RX) queues.

Let's limit the number of RX/TX queues to 4096, this really should
cover all known cases.

A separate patch will add various cond_resched() in the loops
handling sysfs entries at device creation and dismantle.

Tested:

lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
RTNETLINK answers: Invalid argument

lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap

real	0m0.180s
user	0m0.000s
sys	0m0.107s

Fixes: 76ff5cc919 ("rtnl: allow to specify number of rx and tx queues on device creation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 16:06:13 -07:00
Eric Dumazet
2ab2ddd301 inet: make sure to grab rcu_read_lock before using ireq->ireq_opt
Timer handlers do not imply rcu_read_lock(), so my recent fix
triggered a LOCKDEP warning when SYNACK is retransmit.

Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt
usages instead of guessing what is done by callers, since it is
not worth the pain.

Get rid of ireq_opt_deref() helper since it hides the logic
without real benefit, since it is now a standard rcu_dereference().

Fixes: 1ad98e9d1b ("tcp/dccp: fix lockdep issue when SYN is backlogged")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 15:52:12 -07:00
Paolo Abeni
cc16567e5a net: drop unused skb_append_datato_frags()
This helper is unused since commit 988cf74deb ("inet:
Stop generating UFO packets.")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 11:18:09 -07:00
Johannes Berg
5207ca554b cfg80211: sort tracing properly
There were supposed to be two blocks - one for each direction
cfg80211 <-> driver, clean up the code to restore that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:59:07 +02:00
Johannes Berg
ec8f170bc3 cfg80211: unify sending NL80211_CMD_NEW_INTERFACE
There isn't really any need for us to be sending this from
two different places - move cfg80211_init_wdev() later and
send the notification from there, removing it from the non-
netdev case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:58:57 +02:00
Johannes Berg
85dd3da43d cfg80211: combine wdev/netdev unregister code
We currently have two places that do similar things, depending
on whether it's a wdev with or without netdev.

Combine the code to avoid having to duplicate all new additions.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:58:51 +02:00
Johannes Berg
49f9cf0e1b nl80211: add error messages to nl80211_parse_chandef()
Add some error messages to nl80211_parse_chandef() to make
failures here - especially with disabled channels - easier
to diagnose.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:58:44 +02:00
Johannes Berg
b60ad34851 cfg80211: move cookie_counter out of wiphy
There's no reason for drivers to be able to access the
cfg80211 internal cookie counter; move it out of the
wiphy into the rdev structure.

While at it, also make it never assign 0 as a cookie
(we consider that invalid in some places), and warn if
we manage to do that for some reason (wrapping is not
likely to happen with a u64.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:58:36 +02:00
Johannes Berg
71e5e88680 cfg80211: regulatory: make initialization more robust
Since my change to split out the regulatory init to occur later,
any issues during earlier cfg80211_init() or errors during the
platform device allocation would lead to crashes later. Make this
more robust by checking that the earlier initialization succeeded.

Fixes: d7be102f29 ("cfg80211: initialize regulatory keys/database later")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:58:25 +02:00
Masashi Honma
c70616bd8a mac80211: Remove unused initialization
The variable j will be initialized at trailing step.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:30 +02:00
Johannes Berg
5297c65c1d nl80211: remove nl80211_prepare_wdev_dump() skb argument
nl80211_prepare_wdev_dump() is using the output skb to look up
the network namespace, but this isn't really necessary, it can
just as well use the input skb which is available as cb->skb,
the sk is the same anyway.

Therefore, remove the redundant argument.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:30 +02:00
Pradeep Kumar Chitrapu
81e54d08d9 cfg80211: support FTM responder configuration/statistics
Allow userspace to enable fine timing measurement responder
functionality with configurable lci/civic parameters in AP mode.
This can be done at AP start or changing beacon parameters.

A new EXT_FEATURE flag is introduced for drivers to advertise
the capability.

Also nl80211 API support for retrieving statistics is added.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
[remove unused cfg80211_ftm_responder_params, clarify docs,
 move validation into policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:30 +02:00
Johannes Berg
7057f2496c cfg80211: tracing: reuse wiphy_wdev_evt for rdev_get_txq_stats
A simple cleanup, reuse the event definition that we already have.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:30 +02:00
Andrew Zaborowski
efdfce7270 nl80211: Fix a GET_KEY reply attribute
Use the NL80211_KEY_IDX attribute inside the NL80211_ATTR_KEY in
NL80211_CMD_GET_KEY responses to comply with nl80211_key_policy.
This is unlikely to affect existing userspace.

Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:30 +02:00
Wei Yongjun
48f3b9e989 mac80211: fix error handling in ieee80211_register_hw()
Fix to return a negative error code -ENOMEM from the kmemdup
error handling case instead of 0.

Fixes: 09b4a4faf9 ("mac80211: introduce capability flags for VHT EXT NSS support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:29 +02:00
Johannes Berg
e4d4216e91 cfg80211: combine duplicate wdev init code
There's a bit of duplicated code to initialize a wdev, pull it out
into a separate function to call from both places.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:29 +02:00
Erik Stromdahl
a5ae326418 mac80211: fix issue with possible txq NULL pointer
Drivers that do not have the BUFF_MMPDU_TXQ flag set will not have a
TXQ for the special TID = 16.

In this case, the last member in the *struct ieee80211_sta* txq array
will be NULL.

We must check this in order not to get a NULL pointer dereference when
iterating the txq array.

Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:29 +02:00
Colin Ian King
6762696429 cfg80211: remove redundant check of !scan_plan
The check for !scan_plan is redunant as this has been checked
in the proceeding statement and the code returns -ENOBUFS if
it is true. Remove the redundant check.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:29 +02:00
zhong jiang
d25d062f55 cfg80211: remove unnecessary null pointer check in cfg80211_netdev_notifier_call
The iterator in list_for_each_entry_safe is never null, therefore, remove
the redundant null pointer check.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-02 09:56:29 +02:00
Dave Jones
6fe9487892 bond: take rcu lock in netpoll_send_skb_on_dev
The bonding driver lacks the rcu lock when it calls down into
netdev_lower_get_next_private_rcu from bond_poll_controller, which
results in a trace like:

WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
Workqueue: bond0 bond_mii_monitor
RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
Call Trace:
 bond_poll_controller+0x52/0x170
 netpoll_poll_dev+0x79/0x290
 netpoll_send_skb_on_dev+0x158/0x2c0
 netpoll_send_udp+0x2d5/0x430
 write_ext_msg+0x1e0/0x210
 console_unlock+0x3c4/0x630
 vprintk_emit+0xfa/0x2f0
 printk+0x52/0x6e
 ? __netdev_printk+0x12b/0x220
 netdev_info+0x64/0x80
 ? bond_3ad_set_carrier+0xe9/0x180
 bond_select_active_slave+0x1fc/0x310
 bond_mii_monitor+0x709/0x9b0
 process_one_work+0x221/0x5e0
 worker_thread+0x4f/0x3b0
 kthread+0x100/0x140
 ? process_one_work+0x5e0/0x5e0
 ? kthread_delayed_work_timer_fn+0x90/0x90
 ret_from_fork+0x24/0x30

We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
before we call down into netpoll_poll_dev, so just take the lock there.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 23:25:25 -07:00
David Ahern
893626d6a3 rtnetlink: Fail dump if target netnsid is invalid
Link dumps can return results from a target namespace. If the namespace id
is invalid, then the dump request should fail if get_target_net fails
rather than continuing with a dump of the current namespace.

Fixes: 79e1ad148c ("rtnetlink: use netnsid to query interface")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 23:22:04 -07:00
Flavio Leitner
7f6d6558ae Revert "openvswitch: Fix template leak in error cases."
This reverts commit 90c7afc96c.

When the commit was merged, the code used nf_ct_put() to free
the entry, but later on commit 76644232e6 ("openvswitch: Free
tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
is a more appropriate. Now the original problem is removed.

Then 44d6e2f273 ("net: Replace NF_CT_ASSERT() with WARN_ON().")
replaced a debug assert with a WARN_ON() which is trigged now.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 23:20:13 -07:00
Eric Dumazet
fb420d5d91 tcp/fq: move back to CLOCK_MONOTONIC
In the recent TCP/EDT patch series, I switched TCP and sch_fq
clocks from MONOTONIC to TAI, in order to meet the choice done
earlier for sch_etf packet scheduler.

But sure enough, this broke some setups were the TAI clock
jumps forward (by almost 50 year...), as reported
by Leonard Crestez.

If we want to converge later, we'll probably need to add
an skb field to differentiate the clock bases, or a socket option.

In the meantime, an UDP application will need to use CLOCK_MONOTONIC
base for its SCM_TXTIME timestamps if using fq packet scheduler.

Fixes: 72b0094f91 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
Fixes: 142537e419 ("net_sched: sch_fq: switch to CLOCK_TAI")
Fixes: fd2bca2aa7 ("tcp: switch internal pacing timer to CLOCK_TAI")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Leonard Crestez <leonard.crestez@nxp.com>
Tested-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 23:18:51 -07:00
Li RongQing
4da402597c xfrm: fix gro_cells leak when remove virtual xfrm interfaces
The device gro_cells has been initialized, it should be freed,
otherwise it will be leaked

Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-10-02 08:11:45 +02:00
Cong Wang
460b360104 net_sched: fix a crash in tc_new_tfilter()
When tcf_block_find() fails, it already rollbacks the qdisc refcnt,
so its caller doesn't need to clean up this again. Avoid calling
qdisc_put() again by resetting qdisc to NULL for callers.

Reported-by: syzbot+37b8770e6d5a8220a039@syzkaller.appspotmail.com
Fixes: e368fdb61d ("net: sched: use Qdisc rcu API instead of relying on rtnl lock")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 23:09:41 -07:00
David S. Miller
92d7c74b6f Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2018-09-27

Here's one more Bluetooth fix for 4.19, fixing the handling of an
attempt to unpair a device while pairing is in progress.

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:40:39 -07:00
Soheil Hassas Yeganeh
789762ceec tcp: adjust rcv zerocopy hints based on frag sizes
When SKBs are coalesced, we can have SKBs with different
frag sizes. Some with PAGE_SIZE and some not with PAGE_SIZE.
Since recv_skip_hint is always set to the full SKB size,
it can overestimate the amount that should be read using
normal read for coalesced packets.

Change the recv_skip_hint so that it only includes the first
frags that are not of PAGE_SIZE.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:36:56 -07:00
Soheil Hassas Yeganeh
8f2b029311 tcp: set recv_skip_hint when tcp_inq is less than PAGE_SIZE
When we have less than PAGE_SIZE of data on receive queue,
we set recv_skip_hint to 0. Instead, set it to the actual
number of bytes available.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:36:56 -07:00
LUU Duc Canh
d949cfedbc tipc: ignore STATE_MSG on wrong link session
The initial session number when a link is created is based on a random
value, taken from struct tipc_net->random. It is then incremented for
each link reset to avoid mixing protocol messages from different link
sessions.

However, when a bearer is reset all its links are deleted, and will
later be re-created using the same random value as the first time.
This means that if the link never went down between creation and
deletion we will still sometimes have two subsequent sessions with
the same session number. In virtual environments with potentially
long transmission times this has turned out to be a real problem.

We now fix this by randomizing the session number each time a link
is created.

With a session number size of 16 bits this gives a risk of session
collision of 1/64k. To reduce this further, we also introduce a sanity
check on the very first STATE message arriving at a link. If this has
an acknowledge value differing from 0, which is logically impossible,
we ignore the message. The final risk for session collision is hence
reduced to 1/4G, which should be sufficient.

Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:35:30 -07:00
Dan Carpenter
aeadd93f2b net: sched: act_ipt: check for underflow in __tcf_ipt_init()
If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we
return -EINVAL.  But we don't check whether it's smaller than
sizeof(struct xt_entry_target) and that could lead to an out of bounds
read.

Fixes: 7ba699c604 ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:34:14 -07:00
David S. Miller
2240c12d7d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-01

1) Make xfrmi_get_link_net() static to silence a sparse warning.
   From Wei Yongjun.

2) Remove a unused esph pointer definition in esp_input().
   From Haishuang Yan.

3) Allow the NIC driver to quietly refuse xfrm offload
   in case it does not support it, the SA is created
   without offload in this case.
   From Shannon Nelson.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:31:17 -07:00
David S. Miller
ee0b6f4834 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2018-10-01

1) Validate address prefix lengths in the xfrm selector,
   otherwise we may hit undefined behaviour in the
   address matching functions if the prefix is too
   big for the given address family.

2) Fix skb leak on local message size errors.
   From Thadeu Lima de Souza Cascardo.

3) We currently reset the transport header back to the network
   header after a transport mode transformation is applied. This
   leads to an incorrect transport header when multiple transport
   mode transformations are applied. Reset the transport header
   only after all transformations are already applied to fix this.
   From Sowmini Varadhan.

4) We only support one offloaded xfrm, so reset crypto_done after
   the first transformation in xfrm_input(). Otherwise we may call
   the wrong input method for subsequent transformations.
   From Sowmini Varadhan.

5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
   skb_dst_force does not really force a dst refcount anymore, it might
   clear it instead. xfrm code did not expect this, add a check to not
   dereference skb_dst() if it was cleared by skb_dst_force.

6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds
   read in xfrm_state_find. From Sean Tranchetti.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:29:25 -07:00
Yuchung Cheng
041a14d267 tcp: start receiver buffer autotuning sooner
Previously receiver buffer auto-tuning starts after receiving
one advertised window amount of data. After the initial receiver
buffer was raised by patch a337531b94 ("tcp: up initial rmem to
128KB and SYN rwin to around 64KB"), the reciver buffer may take
too long to start raising. To address this issue, this patch lowers
the initial bytes expected to receive roughly the expected sender's
initial window.

Fixes: a337531b94 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 15:45:26 -07:00
Eric Dumazet
1ad98e9d1b tcp/dccp: fix lockdep issue when SYN is backlogged
In normal SYN processing, packets are handled without listener
lock and in RCU protected ingress path.

But syzkaller is known to be able to trick us and SYN
packets might be processed in process context, after being
queued into socket backlog.

In commit 06f877d613 ("tcp/dccp: fix other lockdep splats
accessing ireq_opt") I made a very stupid fix, that happened
to work mostly because of the regular path being RCU protected.

Really the thing protecting ireq->ireq_opt is RCU read lock,
and the pseudo request refcnt is not relevant.

This patch extends what I did in commit 449809a66c ("tcp/dccp:
block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
pair in the paths that might be taken when processing SYN from
socket backlog (thus possibly in process context)

Fixes: 06f877d613 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 15:42:13 -07:00
David S. Miller
c8424ddd97 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree:

1) Skip ip_sabotage_in() for packet making into the VRF driver,
   otherwise packets are dropped, from David Ahern.

2) Clang compilation warning uncovering typo in the
   nft_validate_register_store() call from nft_osf, from Stefan Agner.

3) Double sizeof netlink message length calculations in ctnetlink,
   from zhong jiang.

4) Missing rb_erase() on batch full in rbtree garbage collector,
   from Taehee Yoo.

5) Calm down compilation warning in nf_hook(), from Florian Westphal.

6) Missing check for non-null sk in xt_socket before validating
   netns procedence, from Flavio Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 15:41:01 -07:00
Roman Gushchin
8bad74f984 bpf: extend cgroup bpf core to allow multiple cgroup storage types
In order to introduce per-cpu cgroup storage, let's generalize
bpf cgroup core to support multiple cgroup storage types.
Potentially, per-node cgroup storage can be added later.

This commit is mostly a formal change that replaces
cgroup_storage pointer with a array of cgroup_storage pointers.
It doesn't actually introduce a new storage type,
it will be done later.

Each bpf program is now able to have one cgroup storage of each type.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Yu Zhao
1db5852945 cfg80211: fix use-after-free in reg_process_hint()
reg_process_hint_country_ie() can free regulatory_request and return
REG_REQ_ALREADY_SET. We shouldn't use regulatory_request after it's
called. KASAN error was observed when this happens.

BUG: KASAN: use-after-free in reg_process_hint+0x839/0x8aa [cfg80211]
Read of size 4 at addr ffff8800c430d434 by task kworker/1:3/89
<snipped>
Workqueue: events reg_todo [cfg80211]
Call Trace:
 dump_stack+0xc1/0x10c
 ? _atomic_dec_and_lock+0x1ad/0x1ad
 ? _raw_spin_lock_irqsave+0xa0/0xd2
 print_address_description+0x86/0x26f
 ? reg_process_hint+0x839/0x8aa [cfg80211]
 kasan_report+0x241/0x29b
 reg_process_hint+0x839/0x8aa [cfg80211]
 reg_todo+0x204/0x5b9 [cfg80211]
 process_one_work+0x55f/0x8d0
 ? worker_detach_from_pool+0x1b5/0x1b5
 ? _raw_spin_unlock_irq+0x65/0xdd
 ? _raw_spin_unlock_irqrestore+0xf3/0xf3
 worker_thread+0x5dd/0x841
 ? kthread_parkme+0x1d/0x1d
 kthread+0x270/0x285
 ? pr_cont_work+0xe3/0xe3
 ? rcu_read_unlock_sched_notrace+0xca/0xca
 ret_from_fork+0x22/0x40

Allocated by task 2718:
 set_track+0x63/0xfa
 __kmalloc+0x119/0x1ac
 regulatory_hint_country_ie+0x38/0x329 [cfg80211]
 __cfg80211_connect_result+0x854/0xadd [cfg80211]
 cfg80211_rx_assoc_resp+0x3bc/0x4f0 [cfg80211]
smsc95xx v1.0.6
 ieee80211_sta_rx_queued_mgmt+0x1803/0x7ed5 [mac80211]
 ieee80211_iface_work+0x411/0x696 [mac80211]
 process_one_work+0x55f/0x8d0
 worker_thread+0x5dd/0x841
 kthread+0x270/0x285
 ret_from_fork+0x22/0x40

Freed by task 89:
 set_track+0x63/0xfa
 kasan_slab_free+0x6a/0x87
 kfree+0xdc/0x470
 reg_process_hint+0x31e/0x8aa [cfg80211]
 reg_todo+0x204/0x5b9 [cfg80211]
 process_one_work+0x55f/0x8d0
 worker_thread+0x5dd/0x841
 kthread+0x270/0x285
 ret_from_fork+0x22/0x40
<snipped>

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01 09:14:03 +02:00
Felix Fietkau
211710ca74 mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys
key->sta is only valid after ieee80211_key_link, which is called later
in this function. Because of that, the IEEE80211_KEY_FLAG_RX_MGMT is
never set when management frame protection is enabled.

Fixes: e548c49e6d ("mac80211: add key flag for management keys")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01 09:13:48 +02:00
Stefan Seyfried
848e616e66 cfg80211: fix wext-compat memory leak
cfg80211_wext_giwrate and sinfo.pertid might allocate sinfo.pertid via
rdev_get_station(), but never release it. Fix that.

Fixes: 8689c051a2 ("cfg80211: dynamically allocate per-tid stats for station info")
Signed-off-by: Stefan Seyfried <seife+kernel@b1-systems.com>
[johannes: fix error path, use cfg80211_sinfo_release_content(), add Fixes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01 09:11:36 +02:00
LUU Duc Canh
c140eb166d tipc: fix failover problem
We see the following scenario:
1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
   Since there is a second working link, failover procedure is started.
2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
   A on node 2. The node item 1->2 goes to state FAILINGOVER.
3) Linke endpoint A/2 receives the failover, and is supposed to take
   down its parallell link endpoint B/2, while producing a FAILOVER
   message to send back to A/1.
4) However, B/2 has already been deleted, so no FAILOVER message can
   created.
5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
   any messages that can bring B/1 up again. We are left with a non-
   redundant link between node 1 and 2.

We fix this with letting endpoint A/2 build a dummy FAILOVER message
to send to back to A/1, so that the situation can be resolved.

Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:45:14 -07:00
Yifeng Sun
804fe108fc openvswitch: Use correct reply values in datapath and vport ops
This patch fixes the bug that all datapath and vport ops are returning
wrong values (OVS_FLOW_CMD_NEW or OVS_DP_CMD_NEW) in their replies.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:44:11 -07:00
Vakul Garg
80ece6a03a tls: Remove redundant vars from tls record structure
Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point
to a aad_space and then chain scatterlists sg_plaintext_data,
sg_encrypted_data respectively. Rather than using chained scatterlists
for plaintext and encrypted data in aead_req, it is efficient to store
aad_space in sg_encrypted_data and sg_plaintext_data itself in the
first index and get rid of sg_aead_in, sg_aead_in and further chaining.

This requires increasing size of sg_encrypted_data & sg_plaintext_data
arrarys by 1 to accommodate entry for aad_space. The code which uses
sg_encrypted_data and sg_plaintext_data has been modified to skip first
index as it points to aad_space.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:41:28 -07:00
David S. Miller
f810dcece0 RxRPC fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAW634JPu3V2unywtrAQL90g//ZypSdmc//fh69DltyPE3jiu4suHBR5+m
 zB8XSwEmxeB6uB8325d3LtcJjwrTQjniAuQ/nY2Pv0Lgyrlejc/edUI5o8mwpoPS
 2olfMKrig45shX6qrYM+2c+OJXz4yv6XTUYVRTUzfK0GrdXzYFSKP4Vqqv5bXWqd
 NK7lSkRNX80I76ex/SuBtnnc/jgvkLRYgYJSpABrsIXbb6mF0RJiosJNRSY3e70a
 VZmmecA1yptF4KaiSRtTE3BS+V2iTxGrJSwoQI0+5ZszytKhAjGvhO9AoVicoA0A
 6cSquDmqnIf/87CuErcWAnmCRb/IqBLMU7D3+Tcc38itwf672f5t364qqc/ZMgG6
 GVEHdBo/h82aq5bM5+QWfEN8HMswC441VQ/OP+qJI/j8lwl/fht6oIsd/Um3zWKV
 W1YHyMGKxamNr7vmHxl/CjXK6dJmFOGNFvMJUrjDRtue/ISn1v+kjEigQEUBxbKe
 sPTFWt4OKYgvCjx7R94KYelLxQ9DCpe++wxi/gU4gGJChw1PLQJpbTvSuAOHSQbb
 GNNKgSx9eLdijAN4uAxqz+EBAT4S+1QbvQ0Ng+lZeIVC/+QrQfwdO3+3EyhvGngA
 X91K6MMhtPXkeXvcbN0PsKkyDLp1xgg0/K5Jsztdn9LUC7p4J7aPqmwly26Xm2HT
 1ek9wPoAsiI=
 =MDV3
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-fixes-20180928' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fixes

Here are some miscellaneous fixes for AF_RXRPC:

 (1) Remove a duplicate variable initialisation.

 (2) Fix one of the checks made when we decide to set up a new incoming
     service call in which a flag is being checked in the wrong field of
     the packet header.  This check is abstracted out into helper
     functions.

 (3) Fix RTT gathering.  The code has been trying to make use of socket
     timestamps, but wasn't actually enabling them.  The code has also been
     recording a transmit time for the outgoing packet for which we're
     going to measure the RTT after sending the message - but we can get
     the incoming packet before we get to that and record a negative RTT.

 (4) Fix the emission of BUSY packets (we are emitting ABORTs instead).

 (5) Improve error checking on incoming packets.

 (6) Try to fix a bug in new service call handling whereby a BUG we should
     never be able to reach somehow got triggered.  Do this by moving much
     of the checking as early as possible and not repeating it later
     (depends on (5) above).

 (7) Fix the sockopts set on a UDP6 socket to include the ones set on a
     UDP4 socket so that we receive UDP4 errors and packet-too-large
     notifications too.

 (8) Fix the distribution of errors so that we do it at the point of
     receiving an error in the UDP callback rather than deferring it
     thereby cutting short any transmissions that would otherwise occur in
     the window.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:28:17 -07:00
Tung Nguyen
6787927475 tipc: buffer overflow handling in listener socket
Default socket receive buffer size for a listener socket is 2Mb. For
each arriving empty SYN, the linux kernel allocates a 768 bytes buffer.
This means that a listener socket can serve maximum 2700 simultaneous
empty connection setup requests before it hits a receive buffer
overflow, and much fewer if the SYN is carrying any significant
amount of data.

When this happens the setup request is rejected, and the client
receives an ECONNREFUSED error.

This commit mitigates this problem by letting the client socket try to
retransmit the SYN message multiple times when it sees it rejected with
the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals
in the range of [100 ms, setup_timeout / 4], as many times as there is
room for within the setup timeout limit.

Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:24:22 -07:00
Jon Maloy
25b9221b95 tipc: add SYN bit to connection setup messages
Messages intended for intitating a connection are currently
indistinguishable from regular datagram messages. The TIPC
protocol specification defines bit 17 in word 0 as a SYN bit
to allow sanity check of such messages in the listening socket,
but this has so far never been implemented.

We do that in this commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:24:22 -07:00
Jon Maloy
39fdc9c71f tipc: refactor function tipc_sk_filter_connect()
We refactor the function tipc_sk_filter_connect(), both to make it
more readable and as a preparation for the next commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:24:22 -07:00
Jon Maloy
afe8792fec tipc: refactor function tipc_sk_timeout()
We refactor this function as a preparation for the coming commits in
the same series.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:24:22 -07:00
Jon Maloy
5cbdbd1a1f tipc: refactor function tipc_msg_reverse()
The function tipc_msg_reverse() is reversing the header of a message
while reusing the original buffer. We have seen at several occasions
that this may have unfortunate side effects when the buffer to be
reversed is a clone.

In one of the following commits we will again need to reverse cloned
buffers, so this is the right time to permanently eliminate this
problem. In this commit we let the said function always consume the
original buffer and replace it with a new one when applicable.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:24:22 -07:00
Yuchung Cheng
a337531b94 tcp: up initial rmem to 128KB and SYN rwin to around 64KB
Previously TCP initial receive buffer is ~87KB by default and
the initial receive window is ~29KB (20 MSS). This patch changes
the two numbers to 128KB and ~64KB (rounding down to the multiples
of MSS) respectively. The patch also simplifies the calculations s.t.
the two numbers are directly controlled by sysctl tcp_rmem[1]:

  1) Initial receiver buffer budget (sk_rcvbuf): while this should
     be configured via sysctl tcp_rmem[1], previously tcp_fixup_rcvbuf()
     always override and set a larger size when a new connection
     establishes.

  2) Initial receive window in SYN: previously it is set to 20
     packets if MSS <= 1460. The number 20 was based on the initial
     congestion window of 10: the receiver needs twice amount to
     avoid being limited by the receive window upon out-of-order
     delivery in the first window burst. But since this only
     applies if the receiving MSS <= 1460, connection using large MTU
     (e.g. to utilize receiver zero-copy) may be limited by the
     receive window.

With this patch TCP memory configuration is more straight-forward and
more properly sized to modern high-speed networks by default. Several
popular stacks have been announcing 64KB rwin in SYNs as well.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29 11:22:22 -07:00
Matias Karhumaa
30d65e0804 Bluetooth: Fix debugfs NULL pointer dereference
Fix crash caused by NULL pointer dereference when debugfs functions
le_max_key_read, le_max_key_size_write, le_min_key_size_read or
le_min_key_size_write and Bluetooth adapter was powered off.

Fix is to move max_key_size and min_key_size from smp_dev to hci_dev.
At the same time they were renamed to le_max_key_size and
le_min_key_size.

BUG: unable to handle kernel NULL pointer dereference at 00000000000002e8
PGD 0 P4D 0
Oops: 0000 [#24] SMP PTI
CPU: 2 PID: 6255 Comm: cat Tainted: G      D    OE     4.18.9-200.fc28.x86_64 #1
Hardware name: LENOVO 4286CTO/4286CTO, BIOS 8DET76WW (1.46 ) 06/21/2018
RIP: 0010:le_max_key_size_read+0x45/0xb0 [bluetooth]
Code: 00 00 00 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 8b 87 c8 00 00 00 48 8d 7c 24 04 48 8b 80 48 0a 00 00 <48> 8b 80 e8 02 00 00 0f b6 48 52 e8 fb b6 b3 ed be 04 00 00 00 48
RSP: 0018:ffffab23c3ff3df0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00007f0b4ca2e000 RCX: ffffab23c3ff3f08
RDX: ffffffffc0ddb033 RSI: 0000000000000004 RDI: ffffab23c3ff3df4
RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffab23c3ff3ed8 R11: 0000000000000000 R12: ffffab23c3ff3f08
R13: 00007f0b4ca2e000 R14: 0000000000020000 R15: ffffab23c3ff3f08
FS:  00007f0b4ca0f540(0000) GS:ffff91bd5e280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002e8 CR3: 00000000629fa006 CR4: 00000000000606e0
Call Trace:
 full_proxy_read+0x53/0x80
 __vfs_read+0x36/0x180
 vfs_read+0x8a/0x140
 ksys_read+0x4f/0xb0
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-28 20:53:48 +02:00
Eric Dumazet
c24498c682 netpoll: do not test NAPI_STATE_SCHED in poll_one_napi()
Since we do no longer require NAPI drivers to provide
an ndo_poll_controller(), napi_schedule() has not been done
before poll_one_napi() invocation.

So testing NAPI_STATE_SCHED is likely to cause early returns.

While we are at it, remove outdated comment.

Note to future bisections : This change might surface prior
bugs in drivers. See commit 73f21c653f ("bnxt_en: Fix TX
timeout during netpoll.") for one occurrence.

Fixes: ac3d9dd034 ("netpoll: make ndo_poll_controller() optional")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 11:12:28 -07:00
Wei Yongjun
5362700c94 net: sched: make function qdisc_free_cb() static
Fixes the following sparse warning:

net/sched/sch_generic.c:944:6: warning:
 symbol 'qdisc_free_cb' was not declared. Should it be static?

Fixes: 3a7d0d07a3 ("net: sched: extend Qdisc with rcu")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 11:04:58 -07:00
Nikolay Aleksandrov
1288aa7af2 net: bridge: explicitly zero is_sticky in fdb_create
We need to explicitly zero is_sticky when creating a new fdb, otherwise
we might get a stale value for a new entry.

Fixes: 435f2e7cc0 ("net: bridge: add support for sticky fdb entries")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 10:43:11 -07:00
David S. Miller
05c5e9ff22 More patches than I'd like perhaps, but each seems reasonable:
* two new spectre-v1 mitigations in nl80211
  * TX status fix in general, and mesh in particular
  * powersave vs. offchannel fix
  * regulatory initialization fix
  * fix for a queue hang due to a bad return value
  * allocate TXQs for active monitor interfaces, fixing my
    earlier patch to avoid unnecessary allocations where I
    missed this case needed them
  * fix TDLS data frames priority assignment
  * fix scan results processing to take into account duplicate
    channel numbers (over different operating classes, but we
    don't necessarily know the operating class)
  * various hwsim fixes for radio destruction and new radio
    announcement messages
  * remove an extraneous kernel-doc line
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlusvBUACgkQB8qZga/f
 l8RGlg/+LeQVHh62PJcawXxxa7UyxBhapgkrXC/Np8hnd39xUL41SNMz2wIt7snu
 W2iY+5DOJZ9UyrVL70hMZ26lOLonoafhRdH0utQX3U13iD+3g35Xrps/NETlZxcy
 QftxstUeh9ojdKcOmTtOlHt3PkLCSVa6kS6TsfFGXnMeAgpbmBlPpJQhsCf1x2M4
 ZBUn92sdQu4c8goFUIj8XhgccJbIuAi9pOgQDOE7EXTrxq+Y1SBl1qYCH1bjOkPd
 O5lvV74Sr1SPWfWE6IgUPyRGTEjrImfKm5M/1aegcQ6KRk3LZYtNfjxcrRQg9uzA
 5FMO6zNpu5EEyspS18F1F/9cLo5meqVW4A+Qr6FQ8NwAJ7O31CerEFHUwSHtLfgL
 EjMuWfoKbFaj8YOeShdR3N305Eitc5/uF4m+FVee1tig3GnmUEFzNB0ngMqwFmed
 esHbLlVPDYqoKzN7wZYFB/rTDcQxtcHX8m1SQfMqRgmLVkfKS1AUHk0kql++pivW
 ZzNlePz3gzacurTeU8FqktxXQfdB8CRyeLqKapTE+3mY9HgYZxnKz1wV6lyoMGiM
 yfn0FjsRaYgNXeB0hJ8J6w/+DJWRwahVyas1QbCZKjcY9A67LRpiRlUwEGVGtu/j
 OaNWDFLPVkERYNg/7qhMym5h0RyvcBumYd9TVzwupnTiV3T1ScY=
 =ySdv
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
More patches than I'd like perhaps, but each seems reasonable:
 * two new spectre-v1 mitigations in nl80211
 * TX status fix in general, and mesh in particular
 * powersave vs. offchannel fix
 * regulatory initialization fix
 * fix for a queue hang due to a bad return value
 * allocate TXQs for active monitor interfaces, fixing my
   earlier patch to avoid unnecessary allocations where I
   missed this case needed them
 * fix TDLS data frames priority assignment
 * fix scan results processing to take into account duplicate
   channel numbers (over different operating classes, but we
   don't necessarily know the operating class)
 * various hwsim fixes for radio destruction and new radio
   announcement messages
 * remove an extraneous kernel-doc line
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 10:41:59 -07:00
Wei Yongjun
bf17b67198 net/tls: Make function get_rec() static
Fixes the following sparse warning:

net/tls/tls_sw.c:655:16: warning:
 symbol 'get_rec' was not declared. Should it be static?

Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 10:25:11 -07:00
Wei Yongjun
5d70a67018 net/core: make function ___gnet_stats_copy_basic() static
Fixes the following sparse warning:

net/core/gen_stats.c:166:1: warning:
 symbol '___gnet_stats_copy_basic' was not declared. Should it be static?

Fixes: 5e111210a4 ("net/core: Add new basic hardware counter")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 10:25:11 -07:00
YueHaibing
eddf11e18d net: ovs: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 10:25:11 -07:00
Flavio Leitner
40e4f26e6a netfilter: xt_socket: check sk before checking for netns.
Only check for the network namespace if the socket is available.

Fixes: f564650106 ("netfilter: check if the socket netns is correct.")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:47:41 +02:00
Taehee Yoo
a13f814a67 netfilter: nft_set_rbtree: add missing rb_erase() in GC routine
The nft_set_gc_batch_check() checks whether gc buffer is full.
If gc buffer is full, gc buffer is released by
the nft_set_gc_batch_complete() internally.
In case of rbtree, the rb_erase() should be called before calling the
nft_set_gc_batch_complete(). therefore the rb_erase() should
be called before calling the nft_set_gc_batch_check() too.

test commands:
   table ip filter {
	   set set1 {
		   type ipv4_addr; flags interval, timeout;
		   gc-interval 10s;
		   timeout 1s;
		   elements = {
			   1-2,
			   3-4,
			   5-6,
			   ...
			   10000-10001,
		   }
	   }
   }
   %nft -f test.nft

splat looks like:
[  430.273885] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  430.282158] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  430.283116] CPU: 1 PID: 190 Comm: kworker/1:2 Tainted: G    B             4.18.0+ #7
[  430.283116] Workqueue: events_power_efficient nft_rbtree_gc [nf_tables_set]
[  430.313559] RIP: 0010:rb_next+0x81/0x130
[  430.313559] Code: 08 49 bd 00 00 00 00 00 fc ff df 48 bb 00 00 00 00 00 fc ff df 48 85 c0 75 05 eb 58 48 89 d4
[  430.313559] RSP: 0018:ffff88010cdb7680 EFLAGS: 00010207
[  430.313559] RAX: 0000000000b84854 RBX: dffffc0000000000 RCX: ffffffff83f01973
[  430.313559] RDX: 000000000017090c RSI: 0000000000000008 RDI: 0000000000b84864
[  430.313559] RBP: ffff8801060d4588 R08: fffffbfff09bc349 R09: fffffbfff09bc349
[  430.313559] R10: 0000000000000001 R11: fffffbfff09bc348 R12: ffff880100f081a8
[  430.313559] R13: dffffc0000000000 R14: ffff880100ff8688 R15: dffffc0000000000
[  430.313559] FS:  0000000000000000(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  430.313559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.313559] CR2: 0000000001551008 CR3: 000000005dc16000 CR4: 00000000001006e0
[  430.313559] Call Trace:
[  430.313559]  nft_rbtree_gc+0x112/0x5c0 [nf_tables_set]
[  430.313559]  process_one_work+0xc13/0x1ec0
[  430.313559]  ? _raw_spin_unlock_irq+0x29/0x40
[  430.313559]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[  430.313559]  ? set_load_weight+0x270/0x270
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __switch_to_asm+0x40/0x70
[  430.313559]  ? __switch_to_asm+0x34/0x70
[  430.313559]  ? __schedule+0x6d3/0x1f50
[  430.313559]  ? find_held_lock+0x39/0x1c0
[  430.313559]  ? __sched_text_start+0x8/0x8
[  430.313559]  ? cyc2ns_read_end+0x10/0x10
[  430.313559]  ? save_trace+0x300/0x300
[  430.313559]  ? sched_clock_local+0xd4/0x140
[  430.313559]  ? find_held_lock+0x39/0x1c0
[  430.313559]  ? worker_thread+0x353/0x1120
[  430.313559]  ? worker_thread+0x353/0x1120
[  430.313559]  ? lock_contended+0xe70/0xe70
[  430.313559]  ? __lock_acquire+0x4500/0x4500
[  430.535635]  ? do_raw_spin_unlock+0xa5/0x330
[  430.535635]  ? do_raw_spin_trylock+0x101/0x1a0
[  430.535635]  ? do_raw_spin_lock+0x1f0/0x1f0
[  430.535635]  ? _raw_spin_lock_irq+0x10/0x70
[  430.535635]  worker_thread+0x15d/0x1120
[ ... ]

Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:47:01 +02:00
Taehee Yoo
a2d88182d2 netfilter: nf_tables: use rhashtable_lookup() instead of rhashtable_lookup_fast()
Internally, rhashtable_lookup_fast() calls rcu_read_lock() then,
calls rhashtable_lookup(). so that in places where are guaranteed
by rcu read lock, rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:28:43 +02:00
Taehee Yoo
28c5ed2f93 netfilter: nf_flow_table: remove unnecessary nat flag check code
nf_flow_offload_{ip/ipv6}_hook() check nat flag then, call
nf_flow_nat_{ip/ipv6} but that also check nat flag. so that
nat flag check code in nf_flow_offload_{ip/ipv6}_hook() are unnecessary.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:28:38 +02:00
Christian Göttsche
b473a1f5dd netfilter: nf_tables: add requirements for connsecmark support
Add ability to set the connection tracking secmark value.

Add ability to set the meta secmark value.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:28:34 +02:00
Christian Göttsche
fb96194545 netfilter: nf_tables: add SECMARK support
Add the ability to set the security context of packets within the nf_tables framework.
Add a nft_object for holding security contexts in the kernel and manipulating packets on the wire.

Convert the security context strings at rule addition time to security identifiers.
This is the same behavior like in xt_SECMARK and offers better performance than computing it per packet.

Set the maximum security context length to 256.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:28:29 +02:00
Tan Hu
097f95d319 netfilter: masquerade: don't flush all conntracks if only one address deleted on device
We configured iptables as below, which only allowed incoming data on
established connections:

iptables -t mangle -A PREROUTING -m state --state ESTABLISHED -j ACCEPT
iptables -t mangle -P PREROUTING DROP

When deleting a secondary address, current masquerade implements would
flush all conntracks on this device. All the established connections on
primary address also be deleted, then subsequent incoming data on the
connections would be dropped wrongly because it was identified as NEW
connection.

So when an address was delete, it should only flush connections related
with the address.

Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-28 14:28:26 +02:00
David Howells
f334430316 rxrpc: Fix error distribution
Fix error distribution by immediately delivering the errors to all the
affected calls rather than deferring them to a worker thread.  The problem
with the latter is that retries and things can happen in the meantime when we
want to stop that sooner.

To this end:

 (1) Stop the error distributor from removing calls from the error_targets
     list so that peer->lock isn't needed to synchronise against other adds
     and removals.

 (2) Require the peer's error_targets list to be accessed with RCU, thereby
     avoiding the need to take peer->lock over distribution.

 (3) Don't attempt to affect a call's state if it is already marked complete.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:33:17 +01:00
David Howells
37a675e768 rxrpc: Fix transport sockopts to get IPv4 errors on an IPv6 socket
It seems that enabling IPV6_RECVERR on an IPv6 socket doesn't also turn on
IP_RECVERR, so neither local errors nor ICMP-transported remote errors from
IPv4 peer addresses are returned to the AF_RXRPC protocol.

Make the sockopt setting code in rxrpc_open_socket() fall through from the
AF_INET6 case to the AF_INET case to turn on all the AF_INET options too in
the AF_INET6 case.

Fixes: f2aeed3a59 ("rxrpc: Fix error reception on AF_INET6 sockets")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:33:10 +01:00
David Howells
0099dc589b rxrpc: Make service call handling more robust
Make the following changes to improve the robustness of the code that sets
up a new service call:

 (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
     service ID check and pass that along to rxrpc_new_incoming_call().
     This means that I can remove the check from rxrpc_new_incoming_call()
     without the need to worry about the socket attached to the local
     endpoint getting replaced - which would invalidate the check.

 (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
     done once.  The peer is passed to rxrpc_new_incoming_call(), thereby
     saving the need to repeat the search.

     This also reduces the possibility of rxrpc_publish_service_conn()
     BUG()'ing due to the detection of a duplicate connection, despite the
     initial search done by rxrpc_find_connection_rcu() having turned up
     nothing.

     This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
     non-reentrant and the result of the initial search should still hold
     true, but it has proven possible to hit.

     I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
     the iteration over the hash table if it finds a matching peer with a
     zero usage count, but I don't know for sure since it's only ever been
     hit once that I know of.

     Another possibility is that a bug in rxrpc_data_ready() that checked
     the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
     might've let through a packet that caused a spurious and invalid call
     to be set up.  That is addressed in another patch.

 (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
     usage count rather than stopping and returning not found, just in case
     there's another peer record behind it in the bucket.

 (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
     rather either use the peer cached in (2) or, if one wasn't found,
     preemptively install a new one.

Fixes: 8496af50eb ("rxrpc: Use RCU to access a peer's service connection tree")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:32:49 +01:00
David Howells
403fc2a138 rxrpc: Improve up-front incoming packet checking
Do more up-front checking on incoming packets to weed out invalid ones and
also ones aimed at services that we don't support.

Whilst we're at it, replace the clearing of call and skew if we don't find
a connection with just initialising the variables to zero at the top of the
function.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:32:31 +01:00
David Howells
ece64fec16 rxrpc: Emit BUSY packets when supposed to rather than ABORTs
In the input path, a received sk_buff can be marked for rejection by
setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
(such as an abort code) in skb->priority.  The rejection is handled by
queueing the sk_buff up for dealing with in process context.  The output
code reads the mark and priority and, theoretically, generates an
appropriate response packet.

However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
message with a random abort code is generated (since skb->priority wasn't
set to anything).

Fix this by outputting the appropriate sort of packet.

Also, whilst we're at it, most of the marks are no longer used, so remove
them and rename the remaining two to something more obvious.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:32:19 +01:00
David Howells
b604dd9883 rxrpc: Fix RTT gathering
Fix RTT information gathering in AF_RXRPC by the following means:

 (1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.

 (2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
     collects it, set it at that point.

 (3) Allow ACKs to be requested on the last packet of a client call, but
     not a service call.  We need to be careful lest we undo:

	bf7d620abf
	Author: David Howells <dhowells@redhat.com>
	Date:   Thu Oct 6 08:11:51 2016 +0100
	rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase

     but that only really applies to service calls that we're handling,
     since the client side gets to send the final ACK (or not).

 (4) When about to transmit an ACK or DATA packet, record the Tx timestamp
     before only; don't update the timestamp afterwards.

 (5) Switch the ordering between recording the serial and recording the
     timestamp to always set the serial number first.  The serial number
     shouldn't be seen referenced by an ACK packet until we've transmitted
     the packet bearing it - so in the Rx path, we don't need the timestamp
     until we've checked the serial number.

Fixes: cf1a6474f8 ("rxrpc: Add per-peer RTT tracker")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:32:03 +01:00
David Howells
dc71db34e4 rxrpc: Fix checks as to whether we should set up a new call
There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
flag in the packet type field rather than in the packet flags field.

Fix this by creating a pair of helper functions to check whether the packet
is going to the client or to the server and use them generally.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-28 10:31:20 +01:00
David Howells
092ffc51fb rxrpc: Remove dup code from rxrpc_find_connection_rcu()
rxrpc_find_connection_rcu() initialises variable k twice with the same
information.  Remove one of the initialisations.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-09-27 15:13:07 +01:00
Luiz Augusto von Dentz
a5c3021bb6 Bluetooth: L2CAP: Detect if remote is not able to use the whole MPS
If the remote is not able to fully utilize the MPS choosen recalculate
the credits based on the actual amount it is sending that way it can
still send packets of MTU size without credits dropping to 0.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27 12:52:08 +02:00
Luiz Augusto von Dentz
96cd8eaa13 Bluetooth: L2CAP: Derive rx credits from MTU and MPS
Give enough rx credits for a full packet instead of using an arbitrary
number which may not be enough depending on the MTU and MPS which can
cause interruptions while waiting for more credits, also remove
debugfs entry for l2cap_le_max_credits.

With these changes the credits are restored after each SDU is received
instead of using fixed threshold, this way it is garanteed that there
will always be enough credits to send a packet without waiting more
credits to arrive.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27 12:52:08 +02:00
Luiz Augusto von Dentz
fe1493101a Bluetooth: L2CAP: Derive MPS from connection MTU
This ensures the MPS can fit in a single HCI fragment so each
segment don't have to be reassembled at HCI level, in addition to
that also remove the debugfs entry to configure the MPS.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27 12:52:08 +02:00
Ankit Navik
b950aa8863 Bluetooth: Add definitions and track LE resolve list modification
Add the definitions for adding entries to the LE resolve list and
removing entries from the LE resolve list. When the LE resolve list
gets changed via HCI commands make sure that the internal storage of
the resolve list entries gets updated.

Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27 12:38:52 +02:00
Andrea Parri
5aac493787 Bluetooth: Remove unnecessary smp_mb__{before,after}_atomic
The barriers are unneeded; wait_woken() and woken_wake_function()
already provide us with the required synchronization: remove them
and document that we're relying on the (implicit) synchronization
provided by wait_woken() and woken_wake_function().

Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27 11:59:58 +02:00
Masashi Honma
1222a16014 nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds
Use array_index_nospec() to sanitize i with respect to speculation.

Note that the user doesn't control i directly, but can make it out
of bounds by not finding a threshold in the array.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[add note about user control, as explained by Masashi]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-27 11:44:44 +02:00
Maciej Żenczykowski
d4ce58082f net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int
(fix documentation and sysctl access to treat it as such)

Tested:
  # zcat /proc/config.gz | egrep ^CONFIG_HZ
  CONFIG_HZ_1000=y
  CONFIG_HZ=1000
  # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294968
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
  # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294967
  # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
  # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
  -1
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:33:21 -07:00
Maciej Żenczykowski
1042caa79e net-ipv4: remove 2 always zero parameters from ipv4_redirect()
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:30:55 -07:00
Maciej Żenczykowski
d888f39666 net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:30:55 -07:00
Heiner Kallweit
6194114324 net: core: add member wol_enabled to struct net_device
Add flag wol_enabled to struct net_device indicating whether
Wake-on-LAN is enabled. As first user phy_suspend() will use it to
decide whether PHY can be suspended or not.

Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Fixes: e8cfd9d6c7 ("net: phy: call state machine synchronously in phy_stop")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 20:04:11 -07:00
David Ahern
36f19d5b4f net/ipv6: Remove extra call to ip6_convert_metrics for multipath case
The change to move metrics from the dst to rt6_info moved the call
to ip6_convert_metrics from ip6_route_add to ip6_route_info_create. In
doing so it makes the call in ip6_route_info_append redundant and
actually leaks the metrics installed as part of the ip6_route_info_create.
Remove the now unnecessary call.

Fixes: d4ead6b34b ("net/ipv6: move metrics from dst to rt6_info")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 19:57:23 -07:00
Nikolay Aleksandrov
35750b0bca net: bridge: pack net_bridge better
Further reduce the size of net_bridge with 8 bytes and reduce the number of
holes in it:
 Before: holes: 5, sum holes: 15
 After: holes: 3, sum holes: 7

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
3341d91702 net: bridge: convert mtu_set_by_user to a bit
Convert the last remaining bool option to a bit thus reducing the overall
net_bridge size further by 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
c69c2cd444 net: bridge: convert neigh_suppress_enabled option to a bit
Convert the neigh_suppress_enabled option to a bit.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
675779adbf net: bridge: convert mcast options to bits
This patch converts the rest of the mcast options to bits. It also packs
the mcast options a little better by moving multicast_mld_version to an
existing hole, reducing the net_bridge size by 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
13cefad2f2 net: bridge: convert and rename mcast disabled
Convert mcast disabled to an option bit and while doing so convert the
logic to check if multicast is enabled instead. That is make the logic
follow the option value - if it's set then mcast is enabled and vice versa.
This avoids a few confusing places where we inverted the value that's being
set to follow the mcast_disabled logic.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
be3664a038 net: bridge: convert group_addr_set option to a bit
Convert group_addr_set internal bridge opt to a bit.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
8df3510f28 net: bridge: convert nf call options to bits
No functional change, convert of nf_call_[ip|ip6|arp]tables to bits.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:23 -07:00
Nikolay Aleksandrov
ae75767ec2 net: bridge: add bitfield for options and convert vlan opts
Bridge options have usually been added as separate fields all over the
net_bridge struct taking up space and ending up in different cache lines.
Let's move them to a single bitfield to save up space and speedup lookups.
This patch adds a simple API for option modifying and retrieving using
bitops and converts the first user of the API - the bridge vlan options
(vlan_enabled and vlan_stats_enabled).

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:22 -07:00
Nikolay Aleksandrov
1c1cb6d032 net: bridge: make struct opening bracket consistent
Currently we have a mix of opening brackets on new lines and on the same
line, let's move them all on the same line.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 10:04:22 -07:00
Julian Wiedmann
cd11d11286 net/af_iucv: locate IUCV header via skb_network_header()
This patch attempts to untangle the TX and RX code in qeth from
af_iucv's respective HiperTransport path:
On the TX side, pointing skb_network_header() at the IUCV header
means that qeth_l3_fill_af_iucv_hdr() no longer needs a magical offset
to access the header.
On the RX side, qeth pulls the (fake) L2 header off the skb like any
normal ethernet driver would. This makes working with the IUCV header
in af_iucv easier, since we no longer have to assume a fixed skb layout.

While at it, replace the open-coded length checks in af_iucv's RX path
with pskb_may_pull().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26 09:56:07 -07:00
Matias Karhumaa
cb28c306b9 Bluetooth: SMP: fix crash in unpairing
In case unpair_device() was called through mgmt interface at the same time
when pairing was in progress, Bluetooth kernel module crash was seen.

[  600.351225] general protection fault: 0000 [#1] SMP PTI
[  600.351235] CPU: 1 PID: 11096 Comm: btmgmt Tainted: G           OE     4.19.0-rc1+ #1
[  600.351238] Hardware name: Dell Inc. Latitude E5440/08RCYC, BIOS A18 05/14/2017
[  600.351272] RIP: 0010:smp_chan_destroy.isra.10+0xce/0x2c0 [bluetooth]
[  600.351276] Code: c0 0f 84 b4 01 00 00 80 78 28 04 0f 84 53 01 00 00 4d 85 ed 0f 85 ab 00 00 00 48 8b 08 48 8b 50 08 be 10 00 00 00 48 89 51 08 <48> 89 0a 48 b9 00 02 00 00 00 00 ad de 48 89 48 08 48 8b 83 00 01
[  600.351279] RSP: 0018:ffffa9be839b3b50 EFLAGS: 00010246
[  600.351282] RAX: ffff9c999ac565a0 RBX: ffff9c9996e98c00 RCX: ffff9c999aa28b60
[  600.351285] RDX: dead000000000200 RSI: 0000000000000010 RDI: ffff9c999e403500
[  600.351287] RBP: ffffa9be839b3b70 R08: 0000000000000000 R09: ffffffff92a25c00
[  600.351290] R10: ffffa9be839b3ae8 R11: 0000000000000001 R12: ffff9c995375b800
[  600.351292] R13: 0000000000000000 R14: ffff9c99619a5000 R15: ffff9c9962a01c00
[  600.351295] FS:  00007fb2be27c700(0000) GS:ffff9c999e880000(0000) knlGS:0000000000000000
[  600.351298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  600.351300] CR2: 00007fb2bdadbad0 CR3: 000000041c328001 CR4: 00000000001606e0
[  600.351302] Call Trace:
[  600.351325]  smp_failure+0x4f/0x70 [bluetooth]
[  600.351345]  smp_cancel_pairing+0x74/0x80 [bluetooth]
[  600.351370]  unpair_device+0x1c1/0x330 [bluetooth]
[  600.351399]  hci_sock_sendmsg+0x960/0x9f0 [bluetooth]
[  600.351409]  ? apparmor_socket_sendmsg+0x1e/0x20
[  600.351417]  sock_sendmsg+0x3e/0x50
[  600.351422]  sock_write_iter+0x85/0xf0
[  600.351429]  do_iter_readv_writev+0x12b/0x1b0
[  600.351434]  do_iter_write+0x87/0x1a0
[  600.351439]  vfs_writev+0x98/0x110
[  600.351443]  ? ep_poll+0x16d/0x3d0
[  600.351447]  ? ep_modify+0x73/0x170
[  600.351451]  do_writev+0x61/0xf0
[  600.351455]  ? do_writev+0x61/0xf0
[  600.351460]  __x64_sys_writev+0x1c/0x20
[  600.351465]  do_syscall_64+0x5a/0x110
[  600.351471]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  600.351474] RIP: 0033:0x7fb2bdb62fe0
[  600.351477] Code: 73 01 c3 48 8b 0d b8 6e 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 69 c7 2c 00 00 75 10 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 de 80 01 00 48 89 04 24
[  600.351479] RSP: 002b:00007ffe062cb8f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
[  600.351484] RAX: ffffffffffffffda RBX: 000000000255b3d0 RCX: 00007fb2bdb62fe0
[  600.351487] RDX: 0000000000000001 RSI: 00007ffe062cb920 RDI: 0000000000000004
[  600.351490] RBP: 00007ffe062cb920 R08: 000000000255bd80 R09: 0000000000000000
[  600.351494] R10: 0000000000000353 R11: 0000000000000246 R12: 0000000000000001
[  600.351497] R13: 00007ffe062cbbe0 R14: 0000000000000000 R15: 0000000000000000
[  600.351501] Modules linked in: algif_hash algif_skcipher af_alg cmac ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc overlay arc4 nls_iso8859_1 dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp dell_laptop kvm_intel crct10dif_pclmul dell_smm_hwmon crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media hid_multitouch input_leds joydev serio_raw dell_wmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic dell_smbios dcdbas sparse_keymap
[  600.351569]  snd_hda_intel btusb snd_hda_codec btrtl btbcm btintel snd_hda_core bluetooth(OE) snd_hwdep snd_pcm iwlmvm ecdh_generic wmi_bmof dell_wmi_descriptor snd_seq_midi mac80211 snd_seq_midi_event lpc_ich iwlwifi snd_rawmidi snd_seq snd_seq_device snd_timer cfg80211 snd soundcore mei_me mei dell_rbtn dell_smo8800 mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid i915 nouveau kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt mxm_wmi psmouse ahci sdhci_pci cqhci libahci fb_sys_fops sdhci drm e1000e video wmi
[  600.351637] ---[ end trace e49e9f1df09c94fb ]---
[  600.351664] RIP: 0010:smp_chan_destroy.isra.10+0xce/0x2c0 [bluetooth]
[  600.351666] Code: c0 0f 84 b4 01 00 00 80 78 28 04 0f 84 53 01 00 00 4d 85 ed 0f 85 ab 00 00 00 48 8b 08 48 8b 50 08 be 10 00 00 00 48 89 51 08 <48> 89 0a 48 b9 00 02 00 00 00 00 ad de 48 89 48 08 48 8b 83 00 01
[  600.351669] RSP: 0018:ffffa9be839b3b50 EFLAGS: 00010246
[  600.351672] RAX: ffff9c999ac565a0 RBX: ffff9c9996e98c00 RCX: ffff9c999aa28b60
[  600.351674] RDX: dead000000000200 RSI: 0000000000000010 RDI: ffff9c999e403500
[  600.351676] RBP: ffffa9be839b3b70 R08: 0000000000000000 R09: ffffffff92a25c00
[  600.351679] R10: ffffa9be839b3ae8 R11: 0000000000000001 R12: ffff9c995375b800
[  600.351681] R13: 0000000000000000 R14: ffff9c99619a5000 R15: ffff9c9962a01c00
[  600.351684] FS:  00007fb2be27c700(0000) GS:ffff9c999e880000(0000) knlGS:0000000000000000
[  600.351686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  600.351689] CR2: 00007fb2bdadbad0 CR3: 000000041c328001 CR4: 00000000001606e0

Crash happened because list_del_rcu() was called twice for smp->ltk. This
was possible if unpair_device was called right after ltk was generated
but before keys were distributed.

In this commit smp_cancel_pairing was refactored to cancel pairing if it
is in progress and otherwise just removes keys. Once keys are removed from
rcu list, pointers to smp context's keys are set to NULL to make sure
removed list items are not accessed later.

This commit also adjusts the functionality of mgmt unpair_device() little
bit. Previously pairing was canceled only if pairing was in state that
keys were already generated. With this commit unpair_device() cancels
pairing already in earlier states.

Bug was found by fuzzing kernel SMP implementation using Synopsys
Defensics.

Reported-by: Pekka Oikarainen <pekka.oikarainen@synopsys.com>
Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2018-09-26 12:39:32 +03:00
Masashi Honma
30fe6d50eb nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT
Use array_index_nospec() to sanitize ridx with respect to speculation.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-26 11:17:32 +02:00
Felix Fietkau
8105f9b8a8 mac80211: allocate TXQs for active monitor interfaces
Monitor mode interfaces with the active flag are passed down to the driver.
Drivers using TXQ expect that all interfaces have allocated TXQs before
they get added.

Fixes: 79af1f8661 ("mac80211: avoid allocating TXQs that won't be used")
Cc: stable@vger.kernel.org
Reported-by: Catrinel Catrinescu <cc@80211.de>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-26 11:16:54 +02:00
Parthasarathy Bhuvaragan
3f32d0be6c tipc: lock wakeup & inputq at tipc_link_reset()
In tipc_link_reset() we copy the wakeup queue to input queue using
skb_queue_splice_init(link->wakeupq, link->inputq).
This is performed without holding any locks. The lists might be
simultaneously be accessed by other cpu threads in tipc_sk_rcv(),
something leading to to random missing packets.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:48:56 -07:00
Parthasarathy Bhuvaragan
94b6ddce71 tipc: reset bearer if device carrier not ok
If we detect that under lying carrier detects errors and goes down,
we reset the bearer.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:48:56 -07:00
Roopa Prabhu
7aca011f88 bridge: br_arp_nd_proxy: set icmp6_router if neigh has NTF_ROUTER
Fixes: ed842faeb2 ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:33:21 -07:00
David S. Miller
105bc1306e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-09-25

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Allow for RX stack hardening by implementing the kernel's flow
   dissector in BPF. Idea was originally presented at netconf 2017 [0].
   Quote from merge commit:

     [...] Because of the rigorous checks of the BPF verifier, this
     provides significant security guarantees. In particular, the BPF
     flow dissector cannot get inside of an infinite loop, as with
     CVE-2013-4348, because BPF programs are guaranteed to terminate.
     It cannot read outside of packet bounds, because all memory accesses
     are checked. Also, with BPF the administrator can decide which
     protocols to support, reducing potential attack surface. Rarely
     encountered protocols can be excluded from dissection and the
     program can be updated without kernel recompile or reboot if a
     bug is discovered. [...]

   Also, a sample flow dissector has been implemented in BPF as part
   of this work, from Petar and Willem.

   [0] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

2) Add support for bpftool to list currently active attachment
   points of BPF networking programs providing a quick overview
   similar to bpftool's perf subcommand, from Yonghong.

3) Fix a verifier pruning instability bug where a union member
   from the register state was not cleared properly leading to
   branches not being pruned despite them being valid candidates,
   from Alexei.

4) Various smaller fast-path optimizations in XDP's map redirect
   code, from Jesper.

5) Enable to recognize BPF_MAP_TYPE_REUSEPORT_SOCKARRAY maps
   in bpftool, from Roman.

6) Remove a duplicate check in libbpf that probes for function
   storage, from Taeung.

7) Fix an issue in test_progs by avoid checking for errno since
   on success its value should not be checked, from Mauricio.

8) Fix unused variable warning in bpf_getsockopt() helper when
   CONFIG_INET is not configured, from Anders.

9) Fix a compilation failure in the BPF sample code's use of
   bpf_flow_keys, from Prashant.

10) Minor cleanups in BPF code, from Yue and Zhong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:29:38 -07:00
David S. Miller
71f9b61c5b Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-09-25

This series contains updates to i40e and xsk.

Mariusz fixes an issue where the VF link state was not being updated
properly when the PF is down or up.  Also cleaned up the promiscuous
configuration during a VF reset.

Patryk simplifies the code a bit to use the variables for PF and HW that
are declared, rather than using the VSI pointers.  Cleaned up the
message length parameter to several virtchnl functions, since it was not
being used (or needed).

Harshitha fixes two potential race conditions when trying to change VF
settings by creating a helper function to validate that the VF is
enabled and that the VSI is set up.

Sergey corrects a double "link down" message by putting in a check for
whether or not the link is up or going down.

Björn addresses an AF_XDP zero-copy issue that buffers passed
from userspace to the kernel was leaked when the hardware descriptor
ring was torn down.  A zero-copy capable driver picks buffers off the
fill ring and places them on the hardware receive ring to be completed at
a later point when DMA is complete. Similar on the transmit side; The
driver picks buffers off the transmit ring and places them on the
transmit hardware ring.

In the typical flow, the receive buffer will be placed onto an receive
ring (completed to the user), and the transmit buffer will be placed on
the completion ring to notify the user that the transfer is done.

However, if the driver needs to tear down the hardware rings for some
reason (interface goes down, reconfiguration and such), the userspace
buffers cannot be leaked. They have to be reused or completed back to
userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:25:00 -07:00
Vlad Buslov
787ce6d02d net: sched: use reference counting for tcf blocks on rules update
In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:36 -07:00
Vlad Buslov
0607e43994 net: sched: implement tcf_block_refcnt_{get|put}()
Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:36 -07:00
Vlad Buslov
ab2816295f net: sched: protect block idr with spinlock
Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:36 -07:00
Vlad Buslov
f00234367b net: sched: implement functions to put and flush all chains
Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:36 -07:00
Vlad Buslov
cfebd7e242 net: sched: change tcf block reference counter type to refcount_t
As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:36 -07:00
Vlad Buslov
e368fdb61d net: sched: use Qdisc rcu API instead of relying on rtnl lock
As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:36 -07:00
Vlad Buslov
3a7d0d07a3 net: sched: extend Qdisc with rcu
Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:35 -07:00
Vlad Buslov
86bd446b5c net: sched: rename qdisc_destroy() to qdisc_put()
Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:35 -07:00
Vlad Buslov
6f99528e97 net: core: netlink: add helper refcount dec and lock function
Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:17:35 -07:00
Jakub Kicinski
f5bd91388e net: xsk: add a simple buffer reuse queue
XSK UMEM is strongly single producer single consumer so reuse of
frames is challenging.  Add a simple "stash" of FILL packets to
reuse for drivers to optionally make use of.  This is useful
when driver has to free (ndo_stop) or resize a ring with an active
AF_XDP ZC socket.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-25 13:13:15 -07:00
Parthasarathy Bhuvaragan
92ef12b32f tipc: fix flow control accounting for implicit connect
In the case of implicit connect message with data > 1K, the flow
control accounting is incorrect. At this state, the socket does not
know the peer nodes capability and falls back to legacy flow control
by return 1, however the receiver of this message will perform the
new block accounting. This leads to a slack and eventually traffic
disturbance.

In this commit, we perform tipc_node_get_capabilities() at implicit
connect and perform accounting based on the peer's capability.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 10:47:37 -07:00
Vakul Garg
c774973e91 tls: Fixed a memory leak during socket close
During socket close, if there is a open record with tx context, it needs
to be be freed apart from freeing up plaintext and encrypted scatter
lists. This patch frees up the open record if present in tx context.

Also tls_free_both_sg() has been renamed to tls_free_open_rec() to
indicate that the free record in tx context is being freed inside the
function.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 10:46:59 -07:00
Vakul Garg
b85135b595 tls: Fix socket mem accounting error under async encryption
Current async encryption implementation sometimes showed up socket
memory accounting error during socket close. This results in kernel
warning calltrace. The root cause of the problem is that socket var
sk_forward_alloc gets corrupted due to access in sk_mem_charge()
and sk_mem_uncharge() being invoked from multiple concurrent contexts
in multicore processor. The apis sk_mem_charge() and sk_mem_uncharge()
are called from functions alloc_plaintext_sg(), free_sg() etc. It is
required that memory accounting apis are called under a socket lock.

The plaintext sg data sent for encryption is freed using free_sg() in
tls_encryption_done(). It is wrong to call free_sg() from this function.
This is because this function may run in irq context. We cannot acquire
socket lock in this function.

We remove calling of function free_sg() for plaintext data from
tls_encryption_done() and defer freeing up of plaintext data to the time
when the record is picked up from tx_list and transmitted/freed. When
tls_tx_records() gets called, socket is already locked and thus there is
no concurrent access problem.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 10:43:01 -07:00
David S. Miller
a06ee256e5 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Version bump conflict in batman-adv, take what's in net-next.

iavf conflict, adjustment of netdev_ops in net-next conflicting
with poll controller method removal in net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 10:35:29 -07:00
Willem de Bruijn
d0e13a1488 flow_dissector: lookup netns by skb->sk if skb->dev is NULL
BPF flow dissectors are configured per network namespace.
__skb_flow_dissect looks up the netns through dev_net(skb->dev).

In some dissector paths skb->dev is NULL, such as for Unix sockets.
In these cases fall back to looking up the netns by socket.

Analyzing the codepaths leading to __skb_flow_dissect I did not find
a case where both skb->dev and skb->sk are NULL. Warn and fall back to
standard flow dissector if one is found.

Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-25 17:31:19 +02:00
Paolo Abeni
ccfec9e5cb ip_tunnel: be careful when accessing the inner header
Cong noted that we need the same checks introduced by commit 76c0ddd8c3
("ip6_tunnel: be careful when accessing the inner header")
even for ipv4 tunnels.

Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:27:04 -07:00
Vakul Garg
4128c0cfb1 tls: Fixed uninitialised vars warning
In tls_sw_sendmsg() and tls_sw_sendpage(), it is possible that the
uninitialised variable 'ret' gets passed to sk_stream_error(). So
initialise local variable 'ret' to '0. The warnings were detected by
'smatch' tool.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:24:52 -07:00
Vakul Garg
9932a29ab1 net/tls: Fixed race condition in async encryption
On processors with multi-engine crypto accelerators, it is possible that
multiple records get encrypted in parallel and their encryption
completion is notified to different cpus in multicore processor. This
leads to the situation where tls_encrypt_done() starts executing in
parallel on different cores. In current implementation, encrypted
records are queued to tx_ready_list in tls_encrypt_done(). This requires
addition to linked list 'tx_ready_list' to be protected. As
tls_decrypt_done() could be executing in irq content, it is not possible
to protect linked list addition operation using a lock.

To fix the problem, we remove linked list addition operation from the
irq context. We do tx_ready_list addition/removal operation from
application context only and get rid of possible multiple access to
the linked list. Before starting encryption on the record, we add it to
the tail of tx_ready_list. To prevent tls_tx_records() from transmitting
it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'.
When record encryption gets completed, tls_encrypt_done() has to only
update the 'tx_ready' flag to true & linked list add operation is not
required.

The changed logic brings some other side benefits. Since the records
are always submitted in tls sequence number order for encryption, the
tx_ready_list always remains sorted and addition of new records to it
does not have to traverse the linked list.

Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to
'tx_list'. This is because now, the some of the records at the tail are
not ready to transmit.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:24:24 -07:00
Roopa Prabhu
fc6e8073f3 neighbour: send netlink notification if NTF_ROUTER changes
send netlink notification if neigh_update results in NTF_ROUTER
change and if NEIGH_UPDATE_F_ISROUTER is on. Also move the
NTF_ROUTER change function into a helper.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:21:32 -07:00
Roopa Prabhu
f7aa74e483 neighbour: allow admin to set NTF_ROUTER
This patch allows admin setting of NTF_ROUTER flag
on a neighbour entry. This enables external control
plane (like bgp evpn) to manage neigh entries with
NTF_ROUTER flag.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:21:32 -07:00
Saif Hasan
d8e2262a50 mpls: allow routes on ip6gre devices
Summary:

This appears to be necessary and sufficient change to enable `MPLS` on
`ip6gre` tunnels (RFC4023).

This diff allows IP6GRE devices to be recognized by MPLS kernel module
and hence user can configure interface to accept packets with mpls
headers as well setup mpls routes on them.

Test Plan:

Test plan consists of multiple containers connected via GRE-V6 tunnel.
Then carrying out testing steps as below.

- Carry out necessary sysctl settings on all containers

```
sysctl -w net.mpls.platform_labels=65536
sysctl -w net.mpls.ip_ttl_propagate=1
sysctl -w net.mpls.conf.lo.input=1
```

- Establish IP6GRE tunnels

```
ip -6 tunnel add name if_1_2_1 mode ip6gre \
  local 2401:db00:21:6048:feed:0::1 \
  remote 2401:db00:21:6048:feed:0::2 key 1
ip link set dev if_1_2_1 up
sysctl -w net.mpls.conf.if_1_2_1.input=1
ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link

ip -6 tunnel add name if_1_3_1 mode ip6gre \
  local 2401:db00:21:6048:feed:0::1 \
  remote 2401:db00:21:6048:feed:0::3 key 1
ip link set dev if_1_3_1 up
sysctl -w net.mpls.conf.if_1_3_1.input=1
ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link
```

- Install MPLS encap rules on node-1 towards node-2

```
ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \
  via inet 169.254.0.3 dev if_1_2_1
```

- Install MPLS forwarding rules on node-2 and node-3
```
// node2
ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1

// node3
ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1
```

- Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing
  towards 192.168.0.1 is via IP route directly towards node1 from node4)
```
ping 192.168.0.11
```

- tcpdump on interface to capture ping packets wrapped within MPLS
  header which inturn wrapped within IP6GRE header

```
16:43:41.121073 IP6
  2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2:
  DSTOPT GREv0, key=0x1, length 100:
  MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255)
  IP 192.168.0.1 > 192.168.0.11:
  ICMP echo request, id 1208, seq 45, length 64

0x0000:  6000 2cdb 006c 3c3f 2401 db00 0021 6048  `.,..l<?$....!`H
0x0010:  feed 0000 0000 0001 2401 db00 0021 6048  ........$....!`H
0x0020:  feed 0000 0000 0002 2f00 0401 0401 0100  ......../.......
0x0030:  2000 8847 0000 0001 0002 00ff 0004 01ff  ...G............
0x0040:  4500 0054 3280 4000 ff01 c7cb c0a8 0001  E..T2.@.........
0x0050:  c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b  ...........--<.[
0x0060:  0000 0000 bcd8 0100 0000 0000 1011 1213  ................
0x0070:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
0x0080:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
0x0090:  3435 3637                                4567
```

Signed-off-by: Saif Hasan <has@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:19:27 -07:00
Eelco Chaudron
28169abadb net/sched: Add hardware specific counters to TC actions
Add additional counters that will store the bytes/packets processed by
hardware. These will be exported through the netlink interface for
displaying by the iproute2 tc tool

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:18:42 -07:00
Eelco Chaudron
5e111210a4 net/core: Add new basic hardware counter
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can
be used to count packets/bytes processed by hardware offload.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24 12:18:42 -07:00
Eric Dumazet
ac3d9dd034 netpoll: make ndo_poll_controller() optional
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().

NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.

This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.

Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.

Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23 21:55:24 -07:00
David S. Miller
16fdf8ba98 rds: Fix build regression.
Use DECLARE_* not DEFINE_*

Fixes: 8360ed6745 ("RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats")
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23 12:25:15 -07:00
Maciej Żenczykowski
474ff26008 net-ethtool: ETHTOOL_GUFO did not and should not require CAP_NET_ADMIN
So it should not fail with EPERM even though it is no longer implemented...

This is a fix for:
  (userns)$ egrep ^Cap /proc/self/status
  CapInh: 0000003fffffffff
  CapPrm: 0000003fffffffff
  CapEff: 0000003fffffffff
  CapBnd: 0000003fffffffff
  CapAmb: 0000003fffffffff

  (userns)$ tcpdump -i usb_rndis0
  tcpdump: WARNING: usb_rndis0: SIOCETHTOOL(ETHTOOL_GUFO) ioctl failed: Operation not permitted
  Warning: Kernel filter failed: Bad file descriptor
  tcpdump: can't remove kernel filter: Bad file descriptor

With this change it returns EOPNOTSUPP instead of EPERM.

See also https://github.com/the-tcpdump-group/libpcap/issues/689

Fixes: 08a00fea6d "net: Remove references to NETIF_F_UFO from ethtool."
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-22 17:11:26 -07:00
Nathan Chancellor
5b9b0a80aa decnet: Remove unnecessary check for dev->name
Clang warns that the address of a pointer will always evaluated as true
in a boolean context.

net/decnet/dn_dev.c:1366:10: warning: address of array 'dev->name' will
always evaluate to 'true' [-Wpointer-bool-conversion]
                                dev->name ? dev->name : "???",
                                ~~~~~^~~~ ~
1 warning generated.

Link: https://github.com/ClangBuiltLinux/linux/issues/116
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:48:36 -07:00
Peter Oskolkov
8361962392 net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net
Currently, ip[6]frag_high_thresh sysctl values in new namespaces are
hard-limited to those of the root/init ns.

There are at least two use cases when it would be desirable to
set the high_thresh values higher in a child namespace vs the global hard
limit:

- a security/ddos protection policy may lower the thresholds in the
  root/init ns but allow for a special exception in a child namespace
- testing: a test running in a namespace may want to set these
  thresholds higher in its namespace than what is in the root/init ns

The new behavior:

 # ip netns add testns
 # ip netns exec testns bash

 # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
 net.ipv4.ipfrag_high_thresh = 9000000

 # sysctl net.ipv4.ipfrag_high_thresh
 net.ipv4.ipfrag_high_thresh = 9000000

 # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
 net.ipv6.ip6frag_high_thresh = 9000000

 # sysctl net.ipv6.ip6frag_high_thresh
 net.ipv6.ip6frag_high_thresh = 9000000

The old behavior:

 # ip netns add testns
 # ip netns exec testns bash

 # sysctl -w net.ipv4.ipfrag_high_thresh=9000000
 net.ipv4.ipfrag_high_thresh = 9000000

 # sysctl net.ipv4.ipfrag_high_thresh
 net.ipv4.ipfrag_high_thresh = 4194304

 # sysctl -w net.ipv6.ip6frag_high_thresh=9000000
 net.ipv6.ip6frag_high_thresh = 9000000

 # sysctl net.ipv6.ip6frag_high_thresh
 net.ipv6.ip6frag_high_thresh = 4194304

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:45:52 -07:00
Peter Oskolkov
2475f59c61 ipv6: discard IP frag queue on more errors
This is similar to how ipv4 now behaves:
commit 0ff89efb52 ("ip: fail fast on IP defrag errors").

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:45:52 -07:00
Nathan Chancellor
8360ed6745 RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats
Clang warns when two declarations' section attributes don't match.

net/rds/ib_stats.c:40:1: warning: section does not match previous
declaration [-Wsection]
DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
^
./include/linux/percpu-defs.h:142:2: note: expanded from macro
'DEFINE_PER_CPU_SHARED_ALIGNED'
        DEFINE_PER_CPU_SECTION(type, name,
PER_CPU_SHARED_ALIGNED_SECTION) \
        ^
./include/linux/percpu-defs.h:93:9: note: expanded from macro
'DEFINE_PER_CPU_SECTION'
        extern __PCPU_ATTRS(sec) __typeof__(type) name;
\
               ^
./include/linux/percpu-defs.h:49:26: note: expanded from macro
'__PCPU_ATTRS'
        __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
\
                                ^
net/rds/ib.h:446:1: note: previous attribute is here
DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
^
./include/linux/percpu-defs.h:111:2: note: expanded from macro
'DECLARE_PER_CPU'
        DECLARE_PER_CPU_SECTION(type, name, "")
        ^
./include/linux/percpu-defs.h:87:9: note: expanded from macro
'DECLARE_PER_CPU_SECTION'
        extern __PCPU_ATTRS(sec) __typeof__(type) name
               ^
./include/linux/percpu-defs.h:49:26: note: expanded from macro
'__PCPU_ATTRS'
        __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
\
                                ^
1 warning generated.

The initial definition was added in commit ec16227e14 ("RDS/IB:
Infiniband transport") and the cache aligned definition was added in
commit e6babe4cc4 ("RDS/IB: Stats and sysctls") right after. The
definition probably should have been updated in net/rds/ib.h, which is
what this patch does.

Link: https://github.com/ClangBuiltLinux/linux/issues/114
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:44:07 -07:00
Eric Dumazet
075e264fa3 net/ipv4: avoid compile error in fib_info_nh_uses_dev
net/ipv4/fib_frontend.c: In function 'fib_info_nh_uses_dev':
net/ipv4/fib_frontend.c:322:6: error: unused variable 'ret' [-Werror=unused-variable]
cc1: all warnings being treated as errors

Fixes: 78f2756c5f ("net/ipv4: Move device validation to helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:41:30 -07:00
Eric Dumazet
90caf67b01 net_sched: sch_fq: remove dead code dealing with retransmits
With the earliest departure time model, we no longer plan
special casing TCP retransmits. We therefore remove dead
code (since most compilers understood skb_is_retransmit()
was false)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:38:00 -07:00
Eric Dumazet
c092dd5f4a tcp: switch tcp_internal_pacing() to tcp_wstamp_ns
Now TCP keeps track of tcp_wstamp_ns, recording the earliest
departure time of next packet, we can remove duplicate code
from tcp_internal_pacing()

This removes one ktime_get_tai_ns() call, and a divide.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:38:00 -07:00
Eric Dumazet
ab408b6dc7 tcp: switch tcp and sch_fq to new earliest departure time model
TCP keeps track of tcp_wstamp_ns by itself, meaning sch_fq
no longer has to do it.

Thanks to this model, TCP can get more accurate RTT samples,
since pacing no longer inflates them.

This has the nice effect of removing some delays caused by FQ
quantum mechanism, causing inflated max/P99 latencies.

Also we might relax TCP Small Queue tight limits in the future,
since this new model allow TCP to build bigger batches, since
sch_fq (or a device with earliest departure time offload) ensure
these packets will be delivered on time.

Note that other protocols are not converted (they will probably
never be) so sch_fq has still support for SO_MAX_PACING_RATE

Tested:

Test showing FQ pacing quantum artifact for low-rate flows,
adding unexpected throttles for RPC flows, inflating max and P99 latencies.

The parameters chosen here are to show what happens typically when
a TCP flow has a reduced pacing rate (this can be caused by a reduced
cwin after few losses, or/and rtt above few ms)

MIBS="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY"
Before :
$ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
 Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
19,82.78,5279,3825,482.02

After :
$ netperf -H 10.246.7.133 -t TCP_RR -Cc -T6,6 -- -q 2000000 -r 100,100 -o $MIBS
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.133 () port 0 AF_INET : first burst 0 : cpu bind
Minimum Latency Microseconds,Mean Latency Microseconds,Maximum Latency Microseconds,99th Percentile Latency Microseconds,Stddev Latency Microseconds
20,49.94,128,63,3.18

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:38:00 -07:00
Eric Dumazet
fd2bca2aa7 tcp: switch internal pacing timer to CLOCK_TAI
Next patch will use tcp_wstamp_ns to feed internal
TCP pacing timer, so switch to CLOCK_TAI to share same base.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:37:59 -07:00
Eric Dumazet
d3edd06ea8 tcp: provide earliest departure time in skb->tstamp
Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns,
from usec units to nsec units.

Do not clear skb->tstamp before entering IP stacks in TX,
so that qdisc or devices can implement pacing based on the
earliest departure time instead of socket sk->sk_pacing_rate

Packets are fed with tcp_wstamp_ns, and following patch
will update tcp_wstamp_ns when both TCP and sch_fq switch to
the earliest departure time mechanism.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:37:59 -07:00
Eric Dumazet
9799ccb0e9 tcp: add tcp_wstamp_ns socket field
TCP will soon provide earliest departure time on TX skbs.
It needs to track this in a new variable.

tcp_mstamp_refresh() needs to update this variable, and
became too big to stay an inline.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:37:59 -07:00
Eric Dumazet
142537e419 net_sched: sch_fq: switch to CLOCK_TAI
TCP will soon provide per skb->tstamp with earliest departure time,
so that sch_fq does not have to determine departure time by looking
at socket sk_pacing_rate.

We chose in linux-4.19 CLOCK_TAI as the clock base for transports,
qdiscs, and NIC offloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:37:59 -07:00
Eric Dumazet
2fd66ffba5 tcp: introduce tcp_skb_timestamp_us() helper
There are few places where TCP reads skb->skb_mstamp expecting
a value in usec unit.

skb->tstamp (aka skb->skb_mstamp) will soon store CLOCK_TAI nsec value.

Add tcp_skb_timestamp_us() to provide proper conversion when needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:37:59 -07:00
Dan Carpenter
83fe9a9661 devlink: double free in devlink_resource_fill()
Smatch reports that devlink_dpipe_send_and_alloc_skb() frees the skb
on error so this is a double free.  We fixed a bunch of these bugs in
commit 7fe4d6dcbc ("devlink: Remove redundant free on error path") but
we accidentally overlooked this one.

Fixes: d9f9b9a4d0 ("devlink: Add support for resource abstraction")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:19:07 -07:00
Vakul Garg
a42055e8d2 net/tls: Add support for async encryption of records for performance
In current implementation, tls records are encrypted & transmitted
serially. Till the time the previously submitted user data is encrypted,
the implementation waits and on finish starts transmitting the record.
This approach of encrypt-one record at a time is inefficient when
asynchronous crypto accelerators are used. For each record, there are
overheads of interrupts, driver softIRQ scheduling etc. Also the crypto
accelerator sits idle most of time while an encrypted record's pages are
handed over to tcp stack for transmission.

This patch enables encryption of multiple records in parallel when an
async capable crypto accelerator is present in system. This is achieved
by allowing the user space application to send more data using sendmsg()
even while previously issued data is being processed by crypto
accelerator. This requires returning the control back to user space
application after submitting encryption request to accelerator. This
also means that zero-copy mode of encryption cannot be used with async
accelerator as we must be done with user space application buffer before
returning from sendmsg().

There can be multiple records in flight to/from the accelerator. Each of
the record is represented by 'struct tls_rec'. This is used to store the
memory pages for the record.

After the records are encrypted, they are added in a linked list called
tx_ready_list which contains encrypted tls records sorted as per tls
sequence number. The records from tx_ready_list are transmitted using a
newly introduced function called tls_tx_records(). The tx_ready_list is
polled for any record ready to be transmitted in sendmsg(), sendpage()
after initiating encryption of new tls records. This achieves parallel
encryption and transmission of records when async accelerator is
present.

There could be situation when crypto accelerator completes encryption
later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore
we need a deferred work context to be able to transmit records from
tx_ready_list. The deferred work context gets scheduled if applications
are not sending much data through the socket. If the applications issue
sendmsg()/sendpage() in quick succession, then the scheduling of
tx_work_handler gets cancelled as the tx_ready_list would be polled from
application's context itself. This saves scheduling overhead of deferred
work.

The patch also brings some side benefit. We are able to get rid of the
concept of CLOSED record. This is because the records once closed are
either encrypted and then placed into tx_ready_list or if encryption
fails, the socket error is set. This simplifies the kernel tls
sendpath. However since tls_device.c is still using macros, accessory
functions for CLOSED records have been retained.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:17:34 -07:00
Jeff Barnhill
86f9bd1ff6 net/ipv6: Display all addresses in output of /proc/net/if_inet6
The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
handle starting/stopping the iteration.  The problem is that at some point
during the iteration, an overflow is detected and the process is
subsequently stopped.  The item being shown via seq_printf() when the
overflow occurs is not actually shown, though.  When start() is
subsequently called to resume iterating, it returns the next item, and
thus the item that was being processed when the overflow occurred never
gets printed.

Alter the meaning of the private data member "offset".  Currently, when it
is not 0 (which only happens at the very beginning), "offset" represents
the next hlist item to be printed.  After this change, "offset" always
represents the current item.

This is also consistent with the private data member "bucket", which
represents the current bucket, and also the use of "pos" as defined in
seq_file.txt:
    The pos passed to start() will always be either zero, or the most
    recent pos used in the previous session.

Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:10:49 -07:00
Sean Tranchetti
f88b4c01b9 netlabel: check for IPV4MASK in addrinfo_get
netlbl_unlabel_addrinfo_get() assumes that if it finds the
NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the
NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is
not necessarily the case as the current checks in
netlbl_unlabel_staticadd() and friends are not sufficent to
enforce this.

If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR,
NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes,
these functions will all call netlbl_unlabel_addrinfo_get() which
will then attempt dereference NULL when fetching the non-existent
NLBL_UNLABEL_A_IPV4MASK attribute:

Unable to handle kernel NULL pointer dereference at virtual address 0
Process unlab (pid: 31762, stack limit = 0xffffff80502d8000)
Call trace:
	netlbl_unlabel_addrinfo_get+0x44/0xd8
	netlbl_unlabel_staticremovedef+0x98/0xe0
	genl_rcv_msg+0x354/0x388
	netlink_rcv_skb+0xac/0x118
	genl_rcv+0x34/0x48
	netlink_unicast+0x158/0x1f0
	netlink_sendmsg+0x32c/0x338
	sock_sendmsg+0x44/0x60
	___sys_sendmsg+0x1d0/0x2a8
	__sys_sendmsg+0x64/0xb4
	SyS_sendmsg+0x34/0x4c
	el0_svc_naked+0x34/0x38
Code: 51001149 7100113f 540000a0 f9401508 (79400108)
---[ end trace f6438a488e737143 ]---
Kernel panic - not syncing: Fatal exception

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 18:58:34 -07:00
zhong jiang
f2a2f2166a ipv6: remove redundant null pointer check before kfree_skb
kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 09:04:37 -07:00
zhong jiang
e3c3215e41 net: nci: remove redundant null pointer check before kfree_skb
kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 09:04:37 -07:00
zhong jiang
1d08962ff1 ipv4: remove redundant null pointer check before kfree_skb
kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 09:04:37 -07:00
Vlad Buslov
ec3ed293e7 net_sched: change tcf_del_walker() to take idrinfo->lock
Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking a
reference or idrinfo->lock first, and deletes them directly, disregarding
possible concurrent delete.

Change tcf_del_walker() to take idrinfo->lock while iterating over actions
and use new tcf_idr_release_unsafe() to release them while holding the
lock.

And the blocking function fl_hw_destroy_tmplt() could be called when we
put a filter chain, so defer it to a work queue.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
[xiyou.wangcong@gmail.com: heavily modify the code and changelog]
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 08:55:05 -07:00
Florian Westphal
9306425b70 netfilter: ctnetlink: must check mark attributes vs NULL
else we will oops (null deref) when the attributes aren't present.

Also add back the EOPNOTSUPP in case MARK filtering is requested but
kernel doesn't support it.

Fixes: 59c08c69c2 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
Reported-by: syzbot+e45eda8eda6e93a03959@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-21 10:14:46 +02:00
David Ahern
9f18b6b68e netfilter: nft_fib: Convert nft_fib4_eval to new dev helper
Convert nft_fib4_eval to the new device checking helper and
remove the duplicate code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20 20:01:53 -07:00
David Ahern
91a178258a netfilter: rpfilter: Convert rpfilter_lookup_reverse to new dev helper
Convert rpfilter_lookup_reverse to the new device checking helper
and remove the duplicate code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20 20:01:52 -07:00
David Ahern
78f2756c5f net/ipv4: Move device validation to helper
Move the device matching check in __fib_validate_source to a helper and
export it for use by netfilter modules. Code move only; no functional
change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20 20:01:52 -07:00
Xin Long
d7ab5cdce5 sctp: update dst pmtu with the correct daddr
When processing pmtu update from an icmp packet, it calls .update_pmtu
with sk instead of skb in sctp_transport_update_pmtu.

However for sctp, the daddr in the transport might be different from
inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or
create the route cache. The incorrect daddr will cause a different
route cache created for the path.

So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr
should be updated with the daddr in the transport, and update it back
after it's done.

The issue has existed since route exceptions introduction.

Fixes: 4895c771c7 ("ipv4: Add FIB nexthop exceptions.")
Reported-by: ian.periam@dialogic.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20 11:29:30 -07:00
Mauro Carvalho Chehab
c03e2fa753 docs: fix some broken documentation references
Some documentation files received recent changes and are
pointing to wrong places.

Those references can easily fixed with the help of a
script:

	$ ./scripts/documentation-file-ref-check --fix

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-09-20 11:09:53 -06:00
zhong jiang
346fa83d10 netfilter: conntrack: get rid of double sizeof
sizeof(sizeof()) is quite strange and does not seem to be what
is wanted here.

The issue is detected with the help of Coccinelle.

Fixes: 3921584674 ("netfilter: conntrack: remove nlattr_size pointer from l4proto trackers")
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:40:32 +02:00
Stefan Agner
bab4344975 netfilter: nft_osf: use enum nft_data_types for nft_validate_register_store
The function nft_validate_register_store requires a struct of type
struct nft_data_types. NFTA_DATA_VALUE is of type enum
nft_verdict_attributes. Pass the correct enum type.

This fixes a warning seen with Clang:
  net/netfilter/nft_osf.c:52:8: warning: implicit conversion from
    enumeration type 'enum nft_data_attributes' to different enumeration
    type 'enum nft_data_types' [-Wenum-conversion]
                                          NFTA_DATA_VALUE, NFT_OSF_MAXGENRELEN);
                                          ^~~~~~~~~~~~~~~

Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Link: https://github.com/ClangBuiltLinux/linux/issues/103
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:38:40 +02:00
Taehee Yoo
0de22baabc netfilter: nf_tables: use rhashtable_walk_enter instead of rhashtable_walk_init
rhashtable_walk_init() is deprecated and rhashtable_walk_enter() can be
used instead. rhashtable_walk_init() is wrapper function of
rhashtable_walk_enter() so that logic is actually same.
But rhashtable_walk_enter() doesn't return error hence error path
code can be removed.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:26:56 +02:00
Taehee Yoo
f8b0a3ab06 netfilter: nat: remove duplicate skb_is_nonlinear() in __nf_nat_mangle_tcp_packet()
__nf_nat_mangle_tcp_packet() and nf_nat_mangle_udp_packet() call
mangle_contents(). and __nf_nat_mangle_tcp_packet()
and mangle_contents() call skb_is_nonlinear(). so that
skb_is_nonlinear() in __nf_nat_mangle_tcp_packet() is unnecessary.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:26:56 +02:00
David Ahern
a173f066c7 netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev
For starters, the bridge netfilter code registers operations that
are invoked any time nh_hook is called. Specifically, ip_sabotage_in
watches for nested calls for NF_INET_PRE_ROUTING when a bridge is in
the stack.

Packet wise, the bridge netfilter hook runs first. br_nf_pre_routing
allocates nf_bridge, sets in_prerouting to 1 and calls NF_HOOK for
NF_INET_PRE_ROUTING. It's finish function, br_nf_pre_routing_finish,
then resets in_prerouting flag to 0 and the packet continues up the
stack. The packet eventually makes it to the VRF driver and it invokes
nf_hook for NF_INET_PRE_ROUTING in case any rules have been added against
the vrf device.

Because of the registered operations the call to nf_hook causes
ip_sabotage_in to be invoked. That function sees the nf_bridge on the
skb and that in_prerouting is not set. Thinking it is an invalid nested
call it steals (drops) the packet.

Update ip_sabotage_in to recognize that the bridge or one of its upper
devices (e.g., vlan) can be enslaved to a VRF (L3 master device) and
allow the packet to go through the nf_hook a second time.

Fixes: 73e20b761a ("net: vrf: Add support for PREROUTING rules on vrf device")
Reported-by: D'Souza, Nelson <ndsouza@ciena.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:24:14 +02:00
Florian Westphal
93185c80a5 netfilter: conntrack: clamp l4proto array size at largers supported protocol
All higher l4proto numbers are handled by the generic tracker; the
l4proto lookup function already returns generic one in case the l4proto
number exceeds max size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:08:14 +02:00
Florian Westphal
dd2934a957 netfilter: conntrack: remove l3->l4 mapping information
l4 protocols are demuxed by l3num, l4num pair.

However, almost all l4 trackers are l3 agnostic.

Only exceptions are:
 - gre, icmp (ipv4 only)
 - icmpv6 (ipv6 only)

This commit gets rid of the l3 mapping, l4 trackers can now be looked up
by their IPPROTO_XXX value alone, which gets rid of the additional l3
indirection.

For icmp, ipcmp6 and gre, add a check on state->pf and
return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4,
this seems more fitting than using the generic tracker.

Additionally we can kill the 2nd l4proto definitions that were needed
for v4/v6 split -- they are now the same so we can use single l4proto
struct for each protocol, rather than two.

The EXPORT_SYMBOLs can be removed as all these object files are
part of nf_conntrack with no external references.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:07:35 +02:00
Florian Westphal
ca2ca6e1c0 netfilter: conntrack: remove unused proto arg from netns init functions
Its unused, next patch will remove l4proto->l3proto number to simplify
l4 protocol demuxer lookup.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:03:50 +02:00
Florian Westphal
6fe78fa484 netfilter: conntrack: remove error callback and handle icmp from core
icmp(v6) are the only two layer four protocols that need the error()
callback (to handle icmp errors that are related to an established
connections, e.g. packet too big, port unreachable and the like).

Remove the error callback and handle these two special cases from the core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:02:57 +02:00
Florian Westphal
0150ffbac7 netfilter: conntrack: avoid using ->error callback if possible
The error() handler gets called before allocating or looking up a
connection tracking entry.

We can instead use direct calls from the ->packet() handlers which get
invoked for every packet anyway.

Only exceptions are icmp and icmpv6, these two special cases will be
handled in the next patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:02:42 +02:00
Florian Westphal
83d213fd9d netfilter: conntrack: deconstify packet callback skb pointer
Only two protocols need the ->error() function: icmp and icmpv6.
This is because icmp error mssages might be RELATED to an existing
connection (e.g. PMTUD, port unreachable and the like), and their
->error() handlers do this.

The error callback is already optional, so remove it for
udp and call them from ->packet() instead.

As the error() callback can call checksum functions that write to
skb->csum*, the const qualifier has to be removed as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 18:02:22 +02:00
Florian Westphal
9976fc6e6e netfilter: conntrack: remove the l4proto->new() function
->new() gets invoked after ->error() and before ->packet() if
a conntrack lookup has found no result for the tuple.

We can fold it into ->packet() -- the packet() implementations
can check if the conntrack is confirmed (new) or not
(already in hash).

If its unconfirmed, the conntrack isn't in the hash yet so current
skb created a new conntrack entry.

Only relevant side effect -- if packet() doesn't return NF_ACCEPT
but -NF_ACCEPT (or drop), while the conntrack was just created,
then the newly allocated conntrack is freed right away, rather than not
created in the first place.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 17:57:17 +02:00
Florian Westphal
93e66024b0 netfilter: conntrack: pass nf_hook_state to packet and error handlers
nf_hook_state contains all the hook meta-information: netns, protocol family,
hook location, and so on.

Instead of only passing selected information, pass a pointer to entire
structure.

This will allow to merge the error and the packet handlers and remove
the ->new() function in followup patches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20 17:54:37 +02:00
Johannes Berg
56ce3c5a50 smc: generic netlink family should be __ro_after_init
The generic netlink family is only initialized during module init,
so it should be __ro_after_init like all other generic netlink
families.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20 07:49:55 -07:00
Sean Tranchetti
32bf94fb5c xfrm: validate template mode
XFRM mode parameters passed as part of the user templates
in the IP_XFRM_POLICY are never properly validated. Passing
values other than valid XFRM modes can cause stack-out-of-bounds
reads to occur later in the XFRM processing:

[  140.535608] ================================================================
[  140.543058] BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x17e4/0x1cc4
[  140.550306] Read of size 4 at addr ffffffc0238a7a58 by task repro/5148
[  140.557369]
[  140.558927] Call trace:
[  140.558936] dump_backtrace+0x0/0x388
[  140.558940] show_stack+0x24/0x30
[  140.558946] __dump_stack+0x24/0x2c
[  140.558949] dump_stack+0x8c/0xd0
[  140.558956] print_address_description+0x74/0x234
[  140.558960] kasan_report+0x240/0x264
[  140.558963] __asan_report_load4_noabort+0x2c/0x38
[  140.558967] xfrm_state_find+0x17e4/0x1cc4
[  140.558971] xfrm_resolve_and_create_bundle+0x40c/0x1fb8
[  140.558975] xfrm_lookup+0x238/0x1444
[  140.558977] xfrm_lookup_route+0x48/0x11c
[  140.558984] ip_route_output_flow+0x88/0xc4
[  140.558991] raw_sendmsg+0xa74/0x266c
[  140.558996] inet_sendmsg+0x258/0x3b0
[  140.559002] sock_sendmsg+0xbc/0xec
[  140.559005] SyS_sendto+0x3a8/0x5a8
[  140.559008] el0_svc_naked+0x34/0x38
[  140.559009]
[  140.592245] page dumped because: kasan: bad access detected
[  140.597981] page_owner info is not active (free page?)
[  140.603267]
[  140.653503] ================================================================

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-09-20 08:30:42 +02:00
Paolo Abeni
76c0ddd8c3 ip6_tunnel: be careful when accessing the inner header
the ip6 tunnel xmit ndo assumes that the processed skb always
contains an ip[v6] header, but syzbot has found a way to send
frames that fall short of this assumption, leading to the following splat:

BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
[inline]
BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
net/ipv6/ip6_tunnel.c:1390
CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:53
  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
  ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
  ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
  __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
  netdev_start_xmit include/linux/netdevice.h:4075 [inline]
  xmit_one net/core/dev.c:3026 [inline]
  dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
  __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
  packet_snd net/packet/af_packet.c:2944 [inline]
  packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
  __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
  SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
  SyS_sendmmsg+0x63/0x90 net/socket.c:2162
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x441819
RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
  kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
  kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
  kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
  slab_post_alloc_hook mm/slab.h:445 [inline]
  slab_alloc_node mm/slub.c:2737 [inline]
  __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
  __kmalloc_reserve net/core/skbuff.c:138 [inline]
  __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
  alloc_skb include/linux/skbuff.h:984 [inline]
  alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
  sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
  packet_alloc_skb net/packet/af_packet.c:2803 [inline]
  packet_snd net/packet/af_packet.c:2894 [inline]
  packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg net/socket.c:640 [inline]
  ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
  __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
  SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
  SyS_sendmmsg+0x63/0x90 net/socket.c:2162
  do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

This change addresses the issue adding the needed check before
accessing the inner header.

The ipv4 side of the issue is apparently there since the ipv4 over ipv6
initial support, and the ipv6 side predates git history.

Fixes: c4d3efafcc ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 21:24:28 -07:00
Robert Shearman
3ede0bbcdf ipv6: Allow the l3mdev to be a loopback
There is no way currently for an IPv6 client connect using a loopback
address in a VRF, whereas for IPv4 the loopback address can be added:

    $ sudo ip addr add dev vrfred 127.0.0.1/8
    $ sudo ip -6 addr add ::1/128 dev vrfred
    RTNETLINK answers: Cannot assign requested address

So allow ::1 to be configured on an L3 master device. In order for
this to be usable ip_route_output_flags needs to not consider ::1 to
be a link scope address (since oif == l3mdev and so it would be
dropped), and ipv6_rcv needs to consider the l3mdev to be a loopback
device so that it doesn't drop the packets.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 21:23:01 -07:00
Heiner Kallweit
124eee3f69 net: linkwatch: add check for netdevice being present to linkwatch_do_dev
When bringing down the netdevice (incl. detaching it) and calling
netif_carrier_off directly or indirectly the latter triggers an
asynchronous linkwatch event.
This linkwatch event eventually may fail to access chip registers in
the ndo_get_stats/ndo_get_stats64 callback because the device isn't
accessible any longer, see call trace in [0].

To prevent this scenario don't check for IFF_UP only, but also make
sure that the netdevice is present.

[0] https://lists.openwall.net/netdev/2018/03/15/62

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 21:06:46 -07:00
zhong jiang
f195efb47d net: core: Use FIELD_SIZEOF directly instead of reimplementing its function
FIELD_SIZEOF is defined as a macro to calculate the specified value. Therefore,
We prefer to use the macro rather than calculating its value.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 20:58:04 -07:00
zhong jiang
cb205a8174 net: sched: Use FIELD_SIZEOF directly instead of reimplementing its function
FIELD_SIZEOF is defined as a macro to calculate the specified value. Therefore,
We prefer to use the macro rather than calculating its value.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 20:58:04 -07:00
zhong jiang
86c85e1fc4 net: iucv: Use FIELD_SIZEOF directly instead of reimplementing its function
FIELD_SIZEOF is defined as a macro to calculate the specified value. Therefore,
We prefer to use the macro rather than calculating its value.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 20:58:04 -07:00
David S. Miller
18522108d5 This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Inform users about debugfs interface deprecation, by Sven Eckelmann
 
  - Implement tracing, planned to replace debugfs log messages,
    by Sven Eckelmann
 
  - Move OGM rebroadcasts to per interface struct, by Sven Eckelmann
 
  - Enable LockLess TX to increase throughput, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAluiQXoWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoTreD/9jxzBgB7PXGPJieaGDkVrmgEgO
 mC9PHvDjL58iFAUuWA+OYLx5oW1e3u+YrDBJsd/zD4+L7RbyJYoX546A2X2kpccj
 1yiAbLLs6oYk4vnxIWEnaz2pWJXjh+wRADKcnO1LkMnY4MatuLRDPKRib9FRwq7n
 XvOIg0Md6Wzds4d8dM1tBz3KsZJiP4I5NcvhMzYmsFNl51JJpcECG6KweBV3qrmY
 ZpG/vmUo37D0xv4/15TpL5aj6EpC8nxUp8XJqq0zRnvOx/2VgaKz5Qb8dewN/pUt
 rxTVtBlJX/EXCU+m81mteEhGii9DhTzKXhkho/eJKdsxV2EAf1qxr4/Bbz6ciGs4
 SJuAq0q5owtr/hgAihUXAC0fnOZOw1xOjt2B+snRY4VjoPIg8kr8s4GXupqqFCXY
 DwnWSvt37WZg0D3N+jXU2eNdzdZqm/pkU1yvUUpRd5KNFHHvM+yw+BoeEf8rMcuY
 AefU3ZSE6cOt5MKVozDFwxF24hBTd7CNgi4uD+SWOTtUc7Yj0xIJyY7Yle6q5p3f
 wX77RmS2VKLrxyhFbxGNsNDMiVTqAEJQglmuy3BACy7YVJz07lSUSGbhVowLGk74
 tkgOS5hwVcDgq3o8fdVgTGaj2T5WPf1FDFdZ+g4QvqjTKm0pLHWKc8KEsSerUZfw
 Fjm8DrB3KCXvt6uD+g==
 =XZYf
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20180919' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Inform users about debugfs interface deprecation, by Sven Eckelmann

 - Implement tracing, planned to replace debugfs log messages,
   by Sven Eckelmann

 - Move OGM rebroadcasts to per interface struct, by Sven Eckelmann

 - Enable LockLess TX to increase throughput, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 20:35:37 -07:00
David S. Miller
aa86b03c3e Here are some batman-adv bugfixes:
- Avoid ELP information leak, by Sven Eckelmann
 
  - Fix sysfs segfault issues, by Sven Eckelmann (2 patches)
 
  - Fix locking when adding entries in various lists,
    by Sven Eckelmann (5 patches)
 
  - Fix refcount if queue_work() fails, by Marek Lindner (2 patches)
 
  - Fixup forgotten version bump, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAluiO2gWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoRrcD/9F8CZnQusMuvLe/8sxWiyb70zx
 Fx8dsZS7ppMN4SoQU5dB9NbgDwzDdK1L42Fgzm2CfgeEIYzUOsw6l/+KFuXQSU/t
 Sb/NloyFNQ5nGHO6IcLY8Mqa6bMX/6dlBinRcuTyXOECqqBAaYTf56olNyUty3K6
 9BHfyule+oZtQ5aN6DhOsVB0ATxGoqJK2lbAePj+mzb+5253YykUhjAaEVGM0oNN
 3ULq34dyPxPb2t/z0qNK8CJ2+RXPrAqssEaD0X0awl7spFJYt4jvAXwKvHnR/ryo
 6EDrvnd38bcs+DJrNHhycyk+bITT6njnTv4kJtQuFAd+eap5h7RR5mAElSBgWSAB
 Xy0HjwNY/mb/OMY85/o2EjLOrmr31YpJcLWI6sLqPrUbD4WGhoW1ZB71AQCL3nHL
 6YLln+mYCK00KT5hPU9MzO/3nGyd3uvRjKLLd42chC+YZXxvhWFFEkpsEGYJZq/q
 qduWytS7+W+FSUrDEgTePJh6rdwAbFLSUWqW5bBxA5ATctxxbsslF8IvUJy6fCel
 Sp+iNIkjfp+eB8jFaacrrw4At1YE2nDJlyycMvLQvQOaSCtpGuKiTqsqTHFbgSy4
 Y7QTHNLLbguiASCuhVvP5EYsBm8ZqNUjXMJZCHOCVaXp/cQ3K7Z6yC2pEst2QJ6m
 9lrPOg32Rfo9dRNcTA==
 =B1qZ
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20180919' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
pull request for net: batman-adv 2018-09-19

here are some bugfixes which we would like to see integrated into net.

We forgot to bump the version number in the last round for net-next, so
the belated patch to do that is included - we hope you can adopt it.
This will most likely create a merge conflict later when merging into
net-next with this rounds net-next patchset, but net-next should keep
the 2018.4 version[1].

[1] resolution:

--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -25,11 +25,7 @@
 #define BATADV_DRIVER_DEVICE "batman-adv"

 #ifndef BATADV_SOURCE_VERSION
-<<<<<<<
-#define BATADV_SOURCE_VERSION "2018.3"
-=======
 #define BATADV_SOURCE_VERSION "2018.4"
->>>>>>>
 #endif

 /* B.A.T.M.A.N. parameters */

Please pull or let me know of any problem!

Here are some batman-adv bugfixes:

 - Avoid ELP information leak, by Sven Eckelmann

 - Fix sysfs segfault issues, by Sven Eckelmann (2 patches)

 - Fix locking when adding entries in various lists,
   by Sven Eckelmann (5 patches)

 - Fix refcount if queue_work() fails, by Marek Lindner (2 patches)

 - Fixup forgotten version bump, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 20:35:14 -07:00
Wei Wang
ce7ea4af08 ipv6: fix memory leak on dst->_metrics
When dst->_metrics and f6i->fib6_metrics share the same memory, both
take reference count on the dst_metrics structure. However, when dst is
destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
which does not take care of READONLY metrics and does not release refcnt.
This causes memory leak.
Similar to ipv4 logic, the fix is to properly release refcnt and free
the memory space pointed by dst->_metrics if refcnt becomes 0.

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:17:01 -07:00
Wei Wang
8675860592 Revert "ipv6: fix double refcount of fib6_metrics"
This reverts commit e70a3aad44.

This change causes use-after-free on dst->_metrics.
The crash trace looks like this:
[   97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
[   97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801

[   97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
[   97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
[   97.777957] Call Trace:
[   97.777971]  [<ffffffff895709db>] dump_stack+0x4d/0x72
[   97.777985]  [<ffffffff881651df>] print_address_description+0x6f/0x260
[   97.777997]  [<ffffffff88165747>] kasan_report+0x257/0x370
[   97.778001]  [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
[   97.778004]  [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
[   97.778008]  [<ffffffff894488e6>] ip6_mtu+0x116/0x140
[   97.778013]  [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
[   97.778016]  [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
[   97.778022]  [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
[   97.778037]  [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
[   97.778040]  [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
[   97.778046]  [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
[   97.778059]  [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
[   97.778062]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
[   97.778066]  [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
[   97.778070]  [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
[   97.778075]  [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
[   97.778078]  [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
[   97.778082]  [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
[   97.778085]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
[   97.778088]  [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
[   97.778092]  [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
[   97.778098]  [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
[   97.778102]  [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
[   97.778105]  [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
[   97.778108]  [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
[   97.778113]  [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
[   97.778116]  [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
[   97.778119]  [<ffffffff881646d1>] ? memset+0x31/0x40
[   97.778123]  [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
[   97.778127]  [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
[   97.778130]  [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
[   97.778133]  [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
[   97.778137]  [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
[   97.778141]  [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
[   97.778144]  [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
[   97.778147]  [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
[   97.778152]  [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
[   97.778155]  [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
[   97.778158]  [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
[   97.778162]  [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
[   97.778166]  [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
[   97.778171]  [<ffffffff8960131f>] ? page_fault+0x2f/0x50
[   97.778174]  [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   97.778177] RIP: 0033:0x7f83fa36000d
[   97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[   97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
[   97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
[   97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
[   97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
[   97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000

[   97.779684] Allocated by task 5919:
[   97.783185]  save_stack+0x46/0xd0
[   97.783187]  kasan_kmalloc+0xad/0xe0
[   97.783189]  kmem_cache_alloc_trace+0xdf/0x580
[   97.783190]  ip6_convert_metrics.isra.79+0x7e/0x190
[   97.783192]  ip6_route_info_create+0x60a/0x2480
[   97.783193]  ip6_route_add+0x1d/0x80
[   97.783195]  inet6_rtm_newroute+0xdd/0xf0
[   97.783198]  rtnetlink_rcv_msg+0x641/0xb10
[   97.783200]  netlink_rcv_skb+0x27b/0x3e0
[   97.783202]  rtnetlink_rcv+0x15/0x20
[   97.783203]  netlink_unicast+0x4be/0x720
[   97.783204]  netlink_sendmsg+0x7bc/0xbf0
[   97.783205]  sock_sendmsg+0xba/0xf0
[   97.783207]  ___sys_sendmsg+0x6ca/0x8e0
[   97.783208]  __sys_sendmsg+0xe6/0x190
[   97.783209]  SyS_sendmsg+0x13/0x20
[   97.783211]  do_syscall_64+0x2ac/0x430
[   97.783213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

[   97.784709] Freed by task 0:
[   97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
[   97.794497]  save_stack+0x46/0xd0
[   97.794499]  kasan_slab_free+0x71/0xc0
[   97.794500]  kfree+0x7c/0xf0
[   97.794501]  fib6_info_destroy_rcu+0x24f/0x310
[   97.794504]  rcu_process_callbacks+0x38b/0x1730
[   97.794506]  __do_softirq+0x1c8/0x5d0

Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:17:01 -07:00
YueHaibing
381897798a net/smc: fix sizeof to int comparison
Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:11:43 -07:00
Karsten Graul
71d117f527 net/smc: no urgent data check for listen sockets
Don't check a listen socket for pending urgent data in smc_poll().

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:11:43 -07:00
Ursula Braun
dd65d87a6a net/smc: enable fallback for connection abort in state INIT
If a linkgroup is terminated abnormally already due to failing
LLC CONFIRM LINK or LLC ADD LINK, fallback to TCP is still possible.
In this case do not switch to state SMC_PEERABORTWAIT and do not set
sk_err.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:11:43 -07:00
Ursula Braun
1ca52fcfac net/smc: remove duplicate mutex_unlock
For a failing smc_listen_rdma_finish() smc_listen_decline() is
called. If fallback is possible, the new socket is already enqueued
to be accepted in smc_listen_decline(). Avoid enqueuing a second time
afterwards in this case, otherwise the smc_create_lgr_pending lock
is released twice:
[  373.463976] WARNING: bad unlock balance detected!
[  373.463978] 4.18.0-rc7+ #123 Tainted: G           O
[  373.463979] -------------------------------------
[  373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at:
[  373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc]
[  373.463991] but there are no more locks to release!
[  373.463991]
other info that might help us debug this:
[  373.463993] 2 locks held by kworker/1:1/30:
[  373.463994]  #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0
[  373.464000]  #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0
[  373.464003]
stack backtrace:
[  373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G           O      4.18.0-rc7uschi+ #123
[  373.464007] Hardware name: IBM 2827 H43 738 (LPAR)
[  373.464010] Workqueue: events smc_listen_work [smc]
[  373.464011] Call Trace:
[  373.464015] ([<0000000000114100>] show_stack+0x60/0xd8)
[  373.464019]  [<0000000000a8c9bc>] dump_stack+0x9c/0xd8
[  373.464021]  [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108
[  373.464022]  [<00000000001e045c>] lock_release+0x114/0x4f8
[  373.464025]  [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300
[  373.464027]  [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc]
[  373.464029]  [<0000000000197a68>] process_one_work+0x2a8/0x6b0
[  373.464030]  [<0000000000197ec2>] worker_thread+0x52/0x410
[  373.464033]  [<000000000019fd0e>] kthread+0x15e/0x178
[  373.464035]  [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc
[  373.464052]  [<0000000000aaf584>] kernel_thread_starter+0x0/0xc
[  373.464054] INFO: lockdep is turned off.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:11:43 -07:00
Ursula Braun
648a5a7aed net/smc: fix non-blocking connect problem
In state SMC_INIT smc_poll() delegates polling to the internal
CLC socket. This means, once the connect worker has finished
its kernel_connect() step, the poll wake-up may occur. This is not
intended. The wake-up should occur from the wake up call in
smc_connect_work() after __smc_connect() has finished.
Thus in state SMC_INIT this patch now calls sock_poll_wait() on the
main SMC socket.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 20:11:43 -07:00
Suren Baghdasaryan
674d9de02a NFC: Fix possible memory corruption when handling SHDLC I-Frame commands
When handling SHDLC I-Frame commands "pipe" field used for indexing
into an array should be checked before usage. If left unchecked it
might access memory outside of the array of size NFC_HCI_MAX_PIPES(127).

Malformed NFC HCI frames could be injected by a malicious NFC device
communicating with the device being attacked (remote attack vector),
or even by an attacker with physical access to the I2C bus such that
they could influence the data transfers on that bus (local attack vector).
skb->data is controlled by the attacker and has only been sanitized in
the most trivial ways (CRC check), therefore we can consider the
create_info struct and all of its members to tainted. 'create_info->pipe'
with max value of 255 (uint8) is used to take an offset of the
hdev->pipes array of 127 elements which can lead to OOB write.

Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Allen Pais <allen.pais@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Suggested-by: Kevin Deus <kdeus@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 19:55:01 -07:00
David S. Miller
e366fa4350 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Two new tls tests added in parallel in both net and net-next.

Used Stephen Rothwell's linux-next resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 09:33:27 -07:00
Peter Oskolkov
30bfd93062 net/ipv6: do not copy dst flags on rt init
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).

If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
in dist_init() and decremented in dst_destroy().

This flag is tied to allocation/deallocation of dst_entry and
should not be copied from another dst/route. Otherwise it can happen
that dst_ops::pcpuc_entries counter grows until no new routes can
be allocated because the counter reached ip6_rt_max_size due to
DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.

Fixes: 3b6761d18b ("net/ipv6: Move dst flags to booleans in fib entries")
Cc: David Ahern <dsahern@gmail.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 19:42:14 -07:00
Stefan Nuernberger
076ed3da0c net/ipv4: defensive cipso option parsing
commit 40413955ee ("Cipso: cipso_v4_optptr enter infinite loop") fixed
a possible infinite loop in the IP option parsing of CIPSO. The fix
assumes that ip_options_compile filtered out all zero length options and
that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
While this assumption currently holds true, add explicit checks for zero
length and invalid length options to be safe for the future. Even though
ip_options_compile should have validated the options, the introduction of
new one-byte options can still confuse this code without the additional
checks.

Signed-off-by: Stefan Nuernberger <snu@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Simon Veith <sveith@amazon.de>
Cc: stable@vger.kernel.org
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 19:37:46 -07:00
Colin Ian King
5d288b8865 net: caif: remove redundant null check on frontpkt
It is impossible for frontpkt to be null at the point of the null
check because it has been assigned from rearpkt and there is no
way rearpkt can be null at the point of the assignment because
of the sanity checking and exit paths taken previously. Remove
the redundant null check.

Detected by CoverityScan, CID#114434 ("Logically dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 18:49:57 -07:00
David S. Miller
3275b4df3c Revert "kcm: remove any offset before parsing messages"
This reverts commit 072222b488.

I just read that this causes regressions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 18:43:42 -07:00
Dominique Martinet
072222b488 kcm: remove any offset before parsing messages
The current code assumes kcm users know they need to look for the
strparser offset within their bpf program, which is not documented
anywhere and examples laying around do not do.

The actual recv function does handle the offset well, so we can create a
temporary clone of the skb and pull that one up as required for parsing.

The pull itself has a cost if we are pulling beyond the head data,
measured to 2-3% latency in a noisy VM with a local client stressing
that path. The clone's impact seemed too small to measure.

This bug can be exhibited easily by implementing a "trivial" kcm parser
taking the first bytes as size, and on the client sending at least two
such packets in a single write().

Note that bpf sockmap has the same problem, both for parse and for recv,
so it would pulling twice or a real pull within the strparser logic if
anyone cares about that.

Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 18:42:39 -07:00
zhong jiang
b458925ed5 net: dsa: remove redundant null pointer check before put_device
put_device has taken the null pinter check into account. So it is
safe to remove the duplicated check before put_device.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 08:20:11 -07:00
Zhu Yanjun
1635bb548f net: rds: use memset to optimize the recv
The function rds_inc_init is in recv process. To use memset can optimize
the function rds_inc_init.
The test result:

     Before:
     1) + 24.950 us   |        rds_inc_init [rds]();
     After:
     1) + 10.990 us   |        rds_inc_init [rds]();

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 08:19:51 -07:00
Hauke Mehrtens
711ddb625c net: dsa: tag_gswip: Add gswip to dsa_tag_protocol_to_str()
The gswip tag was missing in the dsa_tag_protocol_to_str() function, add it.

Fixes: 7969119293 ("net: dsa: Add Lantiq / Intel GSWIP tag support")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 08:12:11 -07:00
Daniel Borkmann
50c6b58a81 tls: fix currently broken MSG_PEEK behavior
In kTLS MSG_PEEK behavior is currently failing, strace example:

  [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
  [pid  2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2430] listen(4, 10)               = 0
  [pid  2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
  [pid  2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
  [pid  2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2430] close(4)                    = 0
  [pid  2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
  [pid  2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
  [pid  2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64

As can be seen from strace, there are two TLS records sent,
i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
peeking 'test_read_peektest_read_peektest'. This is clearly
wrong, and what happens is that given peek cannot call into
tls_sw_advance_skb() to unpause strparser and proceed with
the next skb, we end up looping over the current one, copying
the 'test_read_peek' over and over into the user provided
buffer.

Here, we can only peek into the currently held skb (current,
full TLS record) as otherwise we would end up having to hold
all the original skb(s) (depending on the peek depth) in a
separate queue when unpausing strparser to process next
records, minimally intrusive is to return only up to the
current record's size (which likely was what c46234ebb4
("tls: RX path for ktls") originally intended as well). Thus,
after patch we properly peek the first record:

  [pid  2046] wait4(2075,  <unfinished ...>
  [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
  [pid  2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2075] listen(4, 10)               = 0
  [pid  2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
  [pid  2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
  [pid  2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2075] close(4)                    = 0
  [pid  2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
  [pid  2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
  [pid  2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14

Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 08:03:09 -07:00
John Fastabend
7a3dd8c897 tls: async support causes out-of-bounds access in crypto APIs
When async support was added it needed to access the sk from the async
callback to report errors up the stack. The patch tried to use space
after the aead request struct by directly setting the reqsize field in
aead_request. This is an internal field that should not be used
outside the crypto APIs. It is used by the crypto code to define extra
space for private structures used in the crypto context. Users of the
API then use crypto_aead_reqsize() and add the returned amount of
bytes to the end of the request memory allocation before posting the
request to encrypt/decrypt APIs.

So this breaks (with general protection fault and KASAN error, if
enabled) because the request sent to decrypt is shorter than required
causing the crypto API out-of-bounds errors. Also it seems unlikely the
sk is even valid by the time it gets to the callback because of memset
in crypto layer.

Anyways, fix this by holding the sk in the skb->sk field when the
callback is set up and because the skb is already passed through to
the callback handler via void* we can access it in the handler. Then
in the handler we need to be careful to NULL the pointer again before
kfree_skb. I added comments on both the setup (in tls_do_decryption)
and when we clear it from the crypto callback handler
tls_decrypt_done(). After this selftests pass again and fixes KASAN
errors/warnings.

Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 08:01:36 -07:00
Eric Dumazet
bbd6528d28 ipv6: fix possible use-after-free in ip6_xmit()
In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.

Bring IPv6 in line with what we do in IPv4 to fix this.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 07:56:02 -07:00
Taehee Yoo
c8204cab9c netfilter: nat: remove unnecessary rcu_read_lock in nf_nat_redirect_ipv{4/6}
nf_nat_redirect_ipv4() and nf_nat_redirect_ipv6() are only called by
netfilter hook point. so that rcu_read_lock and rcu_read_unlock() are
unnecessary.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 16:11:14 +02:00
Pablo Neira Ayuso
4430b897a2 netfilter: cttimeout: remove superfluous check on layer 4 netlink functions
We assume they are always set accordingly since a874752a10
("netfilter: conntrack: timeout interface depend on
CONFIG_NF_CONNTRACK_TIMEOUT"), so we can get rid of this checks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 16:11:14 +02:00
Florian Westphal
7052ba4080 netfilter: nf_nat_ipv4: remove obsolete EXPORT_SYMBOL
There are no external callers anymore, previous change just
forgot to also remove the EXPORT_SYMBOL().

Fixes: 9971a514ed ("netfilter: nf_nat: add nat type hooks to nat core")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 16:11:13 +02:00
Florian Westphal
70c0eb1ca0 netfilter: xtables: avoid BUG_ON
I see no reason for them, label or timer cannot be NULL, and if they
were, we'll crash with null deref anyway.

For skb_header_pointer failure, just set hotdrop to true and toss
such packet.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 16:11:12 +02:00
Florian Westphal
fa5950e498 netfilter: nf_tables: avoid BUG_ON usage
None of these spots really needs to crash the kernel.
In one two cases we can jsut report error to userspace, in the other
cases we can just use WARN_ON (and leak memory instead).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 16:11:12 +02:00
Pablo Neira Ayuso
0d704967f4 netfilter: xt_cgroup: shrink size of v2 path
cgroup v2 path field is PATH_MAX which is too large, this is placing too
much pressure on memory allocation for people with many rules doing
cgroup v1 classid matching, side effects of this are bug reports like:

https://bugzilla.kernel.org/show_bug.cgi?id=200639

This patch registers a new revision that shrinks the cgroup path to 512
bytes, which is the same approach we follow in similar extensions that
have a path field.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Tejun Heo <tj@kernel.org>
2018-09-17 16:11:03 +02:00
Kristian Evensen
59c08c69c2 netfilter: ctnetlink: Support L3 protocol-filter on flush
The same connection mark can be set on flows belonging to different
address families. This commit adds support for filtering on the L3
protocol when flushing connection track entries. If no protocol is
specified, then all L3 protocols match.

In order to avoid code duplication and a redundant check, the protocol
comparison in ctnetlink_dump_table() has been removed. Instead, a filter
is created if the GET-message triggering the dump contains an address
family. ctnetlink_filter_match() is then used to compare the L3
protocols.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 12:04:14 +02:00
Florian Westphal
6c47260250 netfilter: nf_tables: add xfrm expression
supports fetching saddr/daddr of tunnel mode states, request id and spi.
If direction is 'in', use inbound skb secpath, else dst->xfrm.

Joint work with Máté Eckl.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 11:40:08 +02:00
Florian Westphal
2953d80ff0 netfilter: remove obsolete need_conntrack stub
as of a0ae2562c6 ("netfilter: conntrack: remove l3proto
abstraction") there are no users anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 11:40:07 +02:00
Florian Westphal
0935d55884 netfilter: nf_tables: asynchronous release
Release the committed transaction log from a work queue, moving
expensive synchronize_rcu out of the locked section and providing
opportunity to batch this.

On my test machine this cuts runtime of nft-test.py in half.
Based on earlier patch from Pablo Neira Ayuso.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 11:40:07 +02:00
Florian Westphal
0ef235c717 netfilter: nf_tables: warn when expr implements only one of activate/deactivate
->destroy is only allowed to free data, or do other cleanups that do not
have side effects on other state, such as visibility to other netlink
requests.

Such things need to be done in ->deactivate.
As a transaction can fail, we need to make sure we can undo such
operations, therefore ->activate() has to be provided too.

So print a warning and refuse registration if expr->ops provides
only one of the two operations.

v2: fix nft_expr_check_ops to not repeat same check twice (Jones Desougi)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 11:40:06 +02:00
Florian Westphal
cd5125d8f5 netfilter: nf_tables: split set destruction in deactivate and destroy phase
Splits unbind_set into destroy_set and unbinding operation.

Unbinding removes set from lists (so new transaction would not
find it anymore) but keeps memory allocated (so packet path continues
to work).

Rebind function is added to allow unrolling in case transaction
that wants to remove set is aborted.

Destroy function is added to free the memory, but this could occur
outside of transaction in the future.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 11:29:49 +02:00
Florian Westphal
02b408fae3 netfilter: nf_tables: rt: allow checking if dst has xfrm attached
Useful e.g. to avoid NATting inner headers of to-be-encrypted packets.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17 11:29:49 +02:00
David S. Miller
0376d5dce0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-09-16

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix end boundary calculation in BTF for the type section, from Martin.

2) Fix and revert subtraction of pointers that was accidentally allowed
   for unprivileged programs, from Alexei.

3) Fix bpf_msg_pull_data() helper by using __GFP_COMP in order to avoid
   a warning in linearizing sg pages into a single one for large allocs,
   from Tushar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 17:47:03 -07:00
Haishuang Yan
a82738adff ip6_gre: simplify gre header parsing in ip6gre_err
Same as ip_gre, use gre_parse_header to parse gre header in gre error
handler code.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 15:32:59 -07:00
Haishuang Yan
b0350d51f0 ip_gre: fix parsing gre header in ipgre_err
gre_parse_header stops parsing when csum_err is encountered, which means
tpi->key is undefined and ip_tunnel_lookup will return NULL improperly.

This patch introduce a NULL pointer as csum_err parameter. Even when
csum_err is encountered, it won't return error and continue parsing gre
header as expected.

Fixes: 9f57c67c37 ("gre: Remove support for sharing GRE protocol hook.")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 15:32:59 -07:00
Davide Caratti
2d550dbad8 net/sched: act_police: don't use spinlock in the data path
use RCU instead of spinlocks, to protect concurrent read/write on
act_police configuration. This reduces the effects of contention in the
data path, in case multiple readers are present.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 15:30:22 -07:00
Davide Caratti
93be42f917 net/sched: act_police: use per-cpu counters
use per-CPU counters, instead of sharing a single set of stats with all
cores. This removes the need of using spinlock when statistics are read
or updated.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 15:30:22 -07:00
Paolo Abeni
eb63f2964d udp6: add missing checks on edumux packet processing
Currently the UDPv6 early demux rx code path lacks some mandatory
checks, already implemented into the normal RX code path - namely
the checksum conversion and no_check6_rx check.

Similar to the previous commit, we move the common processing to
an UDPv6 specific helper and call it from both edemux code path
and normal code path. In respect to the UDPv4, we need to add an
explicit check for non zero csum according to no_check6_rx value.

Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Xin Long <lucien.xin@gmail.com>
Fixes: c9f2c1ae12 ("udp6: fix socket leak on early demux")
Fixes: 2abb7cdc0d ("udp: Add support for doing checksum unnecessary conversion")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 15:27:44 -07:00
Paolo Abeni
2b5a921740 udp4: fix IP_CMSG_CHECKSUM for connected sockets
commit 2abb7cdc0d ("udp: Add support for doing checksum
unnecessary conversion") left out the early demux path for
connected sockets. As a result IP_CMSG_CHECKSUM gives wrong
values for such socket when GRO is not enabled/available.

This change addresses the issue by moving the csum conversion to a
common helper and using such helper in both the default and the
early demux rx path.

Fixes: 2abb7cdc0d ("udp: Add support for doing checksum unnecessary conversion")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-16 15:27:44 -07:00
Sven Eckelmann
a7ea49afcc batman-adv: Enable LockLess TX for softif
The batadv interfaces are virtual interfaces which just tunnel the traffic
over other ethernet compatible interfaces. It doesn't need serialization
during the tx phase and is using RCU for most of its internal
datastructures. Since it doesn't have actual queues which could be locked
independently, the throughput gets significantly reduced by the extra lock
in the core net code.

8 parallel TCP connections forwarded by an IPQ4019 based hardware over
5GHz could reach:

* without LLTX: 349 Mibit/s
* with LLTX:    563 Mibit/s

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-15 09:51:10 +02:00
Sven Eckelmann
dee222c7b2 batman-adv: Move OGM rebroadcast stats to orig_ifinfo
B.A.T.M.A.N. IV requires the number of rebroadcast from a neighboring
originator. These statistics are gathered per interface which transmitted
the OGM (and then received it again). Since an originator is not interface
specific, a resizable array was used in each originator.

This resizable array had an entry for each interface and had to be resizes
(for all OGMs) when the number of active interface was modified. This could
cause problems when a large number of interface is added and not enough
continuous memory is available to allocate the array.

There is already a per interface originator structure "batadv_orig_ifinfo"
which can be used to store this information.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-15 09:51:10 +02:00
Sven Eckelmann
d331a73994 batman-adv: Provide debug messages as trace events
A private debug logging infrastructure is currently provided via
$debug_fs/batman_adv/*/log when CONFIG_BATMAN_ADV_DEBUG is enabled. This is
not well integrated in the rest of the tracing infrastructure of the
kernel.

Other components (like mac80211 or ath10k) allow to gather the debug
messages using generic trace events which are better integrated. This makes
it possible to interact with them using the existing userspace tools.

The tracepoint batadv:batadv_dbg will now be available when
CONFIG_BATMAN_ADV_DEBUG and CONFIG_BATMAN_ADV_TRACING is activated. The log
level mask is still used for filtering as usual.

A full system trace for offline parsing can be created (and read) using:

  $ batctl ll all
  $ trace-cmd record -e batadv:batadv_dbg
  $ trace-cmd report

The same can also be done without recording to a file

  $ batctl ll all
  $ trace-cmd stream -e batadv:batadv_dbg

The trace infrastructure is especially helpful when tracing processes:

  $ batctl ll all
  $ ./tools/perf/perf trace --event "batadv:*" batctl p 10.204.32.1
     0.000 batadv:batadv_dbg:batman_adv bat0 Parsing outgoing ARP REQUEST
     0.045 batadv:batadv_dbg:batman_adv bat0 ARP MSG = [src: a2:64:14:53:f8:22-10.204.32.185 dst: 00:00:00:00:00:00-10.204.32.1]
     0.067 batadv:batadv_dbg:batman_adv bat0 Entry updated: 10.204.32.185 a2:64:14:53:f8:22 (vid: -1)
     0.099 batadv:batadv_dbg:batman_adv bat0 batadv_dat_select_candidates(): IP=10.204.32.1 hash(IP)=48902
     0.757 batadv:batadv_dbg:batman_adv bat0 dat_select_candidates() 0: selected fe:2c:91:68:29:2b addr=48977 dist=65460
     1.178 batadv:batadv_dbg:batman_adv bat0 dat_select_candidates() 1: selected fe:81:ab:c5:e3:03 addr=49181 dist=65256
     1.809 batadv:batadv_dbg:batman_adv bat0 dat_select_candidates() 2: selected 66:25:a7:48:37:fb addr=49328 dist=65109
     1.828 batadv:batadv_dbg:batman_adv bat0 DHT_SEND for 10.204.32.1

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-15 09:51:08 +02:00
Petar Penkov
d58e468b11 flow_dissector: implements flow dissector BPF hook
Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
path. The BPF program is per-network namespace.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-14 12:04:33 -07:00
Sven Eckelmann
dabeb13eee batman-adv: Increase version number to 2018.3
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-14 17:59:20 +02:00
Davide Caratti
34043d250f net/sched: act_sample: fix NULL dereference in the data path
Matteo reported the following splat, testing the datapath of TC 'sample':

 BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310
 Read of size 8 at addr 0000000000000000 by task nc/433

 CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
 Call Trace:
  kasan_report.cold.6+0x6c/0x2fa
  tcf_sample_act+0xc4/0x310
  ? dev_hard_start_xmit+0x117/0x180
  tcf_action_exec+0xa3/0x160
  tcf_classify+0xdd/0x1d0
  htb_enqueue+0x18e/0x6b0
  ? deref_stack_reg+0x7a/0xb0
  ? htb_delete+0x4b0/0x4b0
  ? unwind_next_frame+0x819/0x8f0
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  __dev_queue_xmit+0x722/0xca0
  ? unwind_get_return_address_ptr+0x50/0x50
  ? netdev_pick_tx+0xe0/0xe0
  ? save_stack+0x8c/0xb0
  ? kasan_kmalloc+0xbe/0xd0
  ? __kmalloc_track_caller+0xe4/0x1c0
  ? __kmalloc_reserve.isra.45+0x24/0x70
  ? __alloc_skb+0xdd/0x2e0
  ? sk_stream_alloc_skb+0x91/0x3b0
  ? tcp_sendmsg_locked+0x71b/0x15a0
  ? tcp_sendmsg+0x22/0x40
  ? __sys_sendto+0x1b0/0x250
  ? __x64_sys_sendto+0x6f/0x80
  ? do_syscall_64+0x5d/0x150
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ? __sys_sendto+0x1b0/0x250
  ? __x64_sys_sendto+0x6f/0x80
  ? do_syscall_64+0x5d/0x150
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ip_finish_output2+0x495/0x590
  ? ip_copy_metadata+0x2e0/0x2e0
  ? skb_gso_validate_network_len+0x6f/0x110
  ? ip_finish_output+0x174/0x280
  __tcp_transmit_skb+0xb17/0x12b0
  ? __tcp_select_window+0x380/0x380
  tcp_write_xmit+0x913/0x1de0
  ? __sk_mem_schedule+0x50/0x80
  tcp_sendmsg_locked+0x49d/0x15a0
  ? tcp_rcv_established+0x8da/0xa30
  ? tcp_set_state+0x220/0x220
  ? clear_user+0x1f/0x50
  ? iov_iter_zero+0x1ae/0x590
  ? __fget_light+0xa0/0xe0
  tcp_sendmsg+0x22/0x40
  __sys_sendto+0x1b0/0x250
  ? __ia32_sys_getpeername+0x40/0x40
  ? _copy_to_user+0x58/0x70
  ? poll_select_copy_remaining+0x176/0x200
  ? __pollwait+0x1c0/0x1c0
  ? ktime_get_ts64+0x11f/0x140
  ? kern_select+0x108/0x150
  ? core_sys_select+0x360/0x360
  ? vfs_read+0x127/0x150
  ? kernel_write+0x90/0x90
  __x64_sys_sendto+0x6f/0x80
  do_syscall_64+0x5d/0x150
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fefef2b129d
 Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
 RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d
 RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003
 RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
 R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8

tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
forgot to allocate them, because tcf_idr_create() was called with a wrong
value of 'cpustats'. Setting it to true proved to fix the reported crash.

Reported-by: Matteo Croce <mcroce@redhat.com>
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-14 08:46:28 -07:00
Sven Eckelmann
00caf6a2b3 batman-adv: Mark debugfs functionality as deprecated
CONFIG_BATMAN_ADV_DEBUGFS is disabled by default because debugfs is not
supported for batman-adv interfaces in any non-default netns. Any remaining
users of this interface should still be informed about the deprecation and
the generic netlink alternative.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-14 10:50:26 +02:00
Simon Wunderlich
138c72efbd batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-09-14 10:50:26 +02:00
Johannes Berg
1cebf8f143 socket: fix struct ifreq size in compat ioctl
As reported by Reobert O'Callahan, since Viro's commit to kill
dev_ifsioc() we attempt to copy too much data in compat mode,
which may lead to EFAULT when the 32-bit version of struct ifreq
sits at/near the end of a page boundary, and the next page isn't
mapped.

Fix this by passing the approprate compat/non-compat size to copy
and using that, as before the dev_ifsioc() removal. This works
because only the embedded "struct ifmap" has different size, and
this is only used in SIOCGIFMAP/SIOCSIFMAP which has a different
handler. All other parts of the union are naturally compatible.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199469.

Fixes: bf4405737f ("kill dev_ifsioc()")
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 16:01:06 -07:00
Gustavo A. R. Silva
f91845da9f pktgen: Fix fall-through annotation
Replace "fallthru" with a proper "fall through" annotation.

This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 15:36:41 -07:00
Toke Høiland-Jørgensen
c56cae23c6 gso_segment: Reset skb->mac_len after modifying network header
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.

This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.

Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.

Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.

Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:09:32 -07:00
Toke Høiland-Jørgensen
50c12f7401 gso_segment: Reset skb->mac_len after modifying network header
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.

This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.

Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.

Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.

Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:08:40 -07:00
David S. Miller
bffb9b51a2 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2018-09-13

A few Bluetooth fixes for the 4.19-rc series:

 - Fixed rw_semaphore leak in hci_ldisc
 - Fixed local Out-of-Band pairing data handling

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:06:12 -07:00
Sabrina Dubroca
c844eb46b7 tls: clear key material from kernel memory when do_tls_setsockopt_conf fails
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:03:47 -07:00
Sabrina Dubroca
86029d10af tls: zero the crypto information from tls_context before freeing
This contains key material in crypto_send_aes_gcm_128 and
crypto_recv_aes_gcm_128.

Introduce union tls_crypto_context, and replace the two identical
unions directly embedded in struct tls_context with it. We can then
use this union to clean up the memory in the new tls_ctx_free()
function.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:03:47 -07:00
Sabrina Dubroca
7cba09c6d5 tls: don't copy the key out of tls12_crypto_info_aes_gcm_128
There's no need to copy the key to an on-stack buffer before calling
crypto_aead_setkey().

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:03:47 -07:00
Vasily Khoruzhick
f0e0d04413 neighbour: confirm neigh entries when ARP packet is received
Update 'confirmed' timestamp when ARP packet is received. It shouldn't
affect locktime logic and anyway entry can be confirmed by any higher-layer
protocol. Thus it makes sense to confirm it when ARP packet is received.

Fixes: 77d7123342 ("neighbour: update neigh timestamps iff update is effective")
Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 12:01:29 -07:00
Roopa Prabhu
56a49d7048 net: rtnl_configure_link: fix dev flags changes arg to __dev_notify_flags
This fix addresses https://bugzilla.kernel.org/show_bug.cgi?id=201071

Commit 5025f7f7d5 wrongly relied on __dev_change_flags to notify users of
dev flag changes in the case when dev->rtnl_link_state = RTNL_LINK_INITIALIZED.
Fix it by indicating flag changes explicitly to __dev_notify_flags.

Fixes: 5025f7f7d5 ("rtnetlink: add rtnl_link_state check in rtnl_configure_link")
Reported-By: Liam mcbirnie <liam.mcbirnie@boeing.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 11:01:32 -07:00
Cong Wang
f5b9bac745 net_sched: notify filter deletion when deleting a chain
When we delete a chain of filters, we need to notify
user-space we are deleting each filters in this chain
too.

Fixes: 32a4f5ecd7 ("net: sched: introduce chain object to uapi")
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:07:40 -07:00
Cong Wang
9708d2b5b7 llc: avoid blocking in llc_sap_close()
llc_sap_close() is called by llc_sap_put() which
could be called in BH context in llc_rcv(). We can't
block in BH.

There is no reason to block it here, kfree_rcu() should
be sufficient.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 09:04:58 -07:00
Xin Long
22d0bd82cc ipv6: use rt6_info members when dst is set in rt6_fill_node
In inet6_rtm_getroute, since Commit 93531c6743 ("net/ipv6: separate
handling of FIB entries from dst based routes"), it has used rt->from
to dump route info instead of rt.

However for some route like cache, some of its information like flags
or gateway is not the same as that of the 'from' one. It caused 'ip
route get' to dump the wrong route information.

In Jianlin's testing, the output information even lost the expiration
time for a pmtu route cache due to the wrong fib6_flags.

So change to use rt6_info members for dst addr, src addr, flags and
gateway when it tries to dump a route entry without fibmatch set.

v1->v2:
  - not use rt6i_prefsrc.
  - also fix the gw dump issue.

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:20:37 -07:00
Andre Naujoks
15033f0457 ipv6: Add sockopt IPV6_MULTICAST_ALL analogue to IP_MULTICAST_ALL
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.

A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.

Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.

The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:17:27 -07:00
Hauke Mehrtens
7969119293 net: dsa: Add Lantiq / Intel GSWIP tag support
This handles the tag added by the PMAC on the VRX200 SoC line.

The GSWIP uses internally a GSWIP special tag which is located after the
Ethernet header. The PMAC which connects the GSWIP to the CPU converts
this special tag used by the GSWIP into the PMAC special tag which is
added in front of the Ethernet header.

This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP
versions use slightly different PMAC special tags.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13 08:14:33 -07:00
David S. Miller
aaf9253025 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-12 22:22:42 -07:00
Nikolay Aleksandrov
435f2e7cc0 net: bridge: add support for sticky fdb entries
Add support for entries which are "sticky", i.e. will not change their port
if they show up from a different one. A new ndm flag is introduced for that
purpose - NTF_STICKY. We allow to set it only to non-local entries.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 20:30:03 -07:00
Tushar Dave
4c3d795cb0 bpf: use __GFP_COMP while allocating page
Helper bpg_msg_pull_data() can allocate multiple pages while
linearizing multiple scatterlist elements into one shared page.
However, if the shared page has size > PAGE_SIZE, using
copy_page_to_iter() causes below warning.

e.g.
[ 6367.019832] WARNING: CPU: 2 PID: 7410 at lib/iov_iter.c:825
page_copy_sane.part.8+0x0/0x8

To avoid above warning, use __GFP_COMP while allocating multiple
contiguous pages.

Fixes: 015632bb30 ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-12 23:47:28 +02:00
Cong Wang
12a78b026f tipc: check return value of __tipc_dump_start()
When __tipc_dump_start() fails with running out of memory,
we have no reason to continue, especially we should avoid
calling tipc_dump_done().

Fixes: 8f5c5fcf35 ("tipc: call start and done ops directly in __tipc_nl_compat_dumpit()")
Reported-and-tested-by: syzbot+3f8324abccfbf8c74a9f@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 13:15:04 -07:00
Cong Wang
cc4dfb7f70 rds: fix two RCU related problems
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.

Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.

The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.

Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com
Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:09:19 -07:00
Li RongQing
0041195d55 netlink: remove hash::nelems check in netlink_insert
The type of hash::nelems has been changed from size_t to atom_t
which in fact is int, so not need to check if BITS_PER_LONG, that
is bit number of size_t, is bigger than 32

and rht_grow_above_max() will be called to check if hashtable is
too big, ensure it can not bigger than 1<<31

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:08:44 -07:00
Vakul Garg
150085791a net/tls: Fixed return value when tls_complete_pending_work() fails
In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has
been set to return value of tls_complete_pending_work(). This allows
return of proper error code if tls_complete_pending_work() fails.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 00:04:28 -07:00
Haishuang Yan
51dc63e391 erspan: fix error handling for erspan tunnel
When processing icmp unreachable message for erspan tunnel, tunnel id
should be erspan_net_id instead of ipgre_net_id.

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:50:54 -07:00
Haishuang Yan
5a64506b5c erspan: return PACKET_REJECT when the appropriate tunnel is not found
If erspan tunnel hasn't been established, we'd better send icmp port
unreachable message after receive erspan packets.

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:50:53 -07:00
Willem de Bruijn
0297c1c2ea tcp: rate limit synflood warnings further
Convert pr_info to net_info_ratelimited to limit the total number of
synflood warnings.

Commit 946cedccbd ("tcp: Change possible SYN flooding messages")
rate limits synflood warnings to one per listener.

Workloads that open many listener sockets can still see a high rate of
log messages. Syzkaller is one frequent example.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 23:34:20 -07:00
David S. Miller
4ecdf77091 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for you net tree:

1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing.

2) Restore conntrack dependency on xt_cluster, from Martin Willi.

3) Fix splat with GSO skbs from the checksum target, from Florian Westphal.

4) Rework ct timeout support, the template strategy to attach custom timeouts
   is not correct since it will not work in conjunction with conntrack zones
   and we have a possible free after use when removing the rule due to missing
   refcounting. To fix these problems, do not use conntrack template at all
   and set custom timeout on the already valid conntrack object. This
   fix comes with a preparation patch to simplify timeout adjustment by
   initializating the first position of the timeout array for all of the
   existing trackers. Patchset from Florian Westphal.

5) Fix missing dependency on from IPv4 chain NAT type, from Florian.

6) Release chain reference counter from the flush path, from Taehee Yoo.

7) After flushing an iptables ruleset, conntrack hooks are unregistered
   and entries are left stale to be cleaned up by the timeout garbage
   collector. No TCP tracking is done on established flows by this time.
   If ruleset is reloaded, then hooks are registered again and TCP
   tracking is restored, which considers packets to be invalid. Clear
   window tracking to exercise TCP flow pickup from the middle given that
   history is lost for us. Again from Florian.

8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y
   and CONFIG_NF_CT_NETLINK_TIMEOUT=n.

9) Broken CT target due to returning incorrect type from
   ctnl_timeout_find_get().

10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner.

11) Missing conversion of hashlimit sysctl interface to new API, from
    Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11 21:17:30 -07:00
Anders Roxell
1edb6e035e net/core/filter: fix unused-variable warning
Building with CONFIG_INET=n will show the warning below:
net/core/filter.c: In function ‘____bpf_getsockopt’:
net/core/filter.c:4048:19: warning: unused variable ‘tp’ [-Wunused-variable]
  struct tcp_sock *tp;
                   ^~
net/core/filter.c:4046:31: warning: unused variable ‘icsk’ [-Wunused-variable]
  struct inet_connection_sock *icsk;
                               ^~~~
Move the variable declarations inside the {} block where they are used.

Fixes: 1e215300f1 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-11 14:54:52 -07:00
Matias Karhumaa
4ba5175f2c Bluetooth: Use correct tfm to generate OOB data
In case local OOB data was generated and other device initiated pairing
claiming that it has got OOB data, following crash occurred:

[  222.847853] general protection fault: 0000 [#1] SMP PTI
[  222.848025] CPU: 1 PID: 42 Comm: kworker/u5:0 Tainted: G         C        4.18.0-custom #4
[  222.848158] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  222.848307] Workqueue: hci0 hci_rx_work [bluetooth]
[  222.848416] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth]
[  222.848540] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48
[  222.848914] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293
[  222.849021] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a
[  222.849111] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800
[  222.849192] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800
[  222.849269] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00
[  222.849350] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101
[  222.849429] FS:  0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000
[  222.849518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.849586] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0
[  222.849671] Call Trace:
[  222.849745]  ? sc_send_public_key+0x110/0x2a0 [bluetooth]
[  222.849825]  ? sc_send_public_key+0x115/0x2a0 [bluetooth]
[  222.849925]  smp_recv_cb+0x959/0x2490 [bluetooth]
[  222.850023]  ? _cond_resched+0x19/0x40
[  222.850105]  ? mutex_lock+0x12/0x40
[  222.850202]  l2cap_recv_frame+0x109d/0x3420 [bluetooth]
[  222.850315]  ? l2cap_recv_frame+0x109d/0x3420 [bluetooth]
[  222.850426]  ? __switch_to_asm+0x34/0x70
[  222.850515]  ? __switch_to_asm+0x40/0x70
[  222.850625]  ? __switch_to_asm+0x34/0x70
[  222.850724]  ? __switch_to_asm+0x40/0x70
[  222.850786]  ? __switch_to_asm+0x34/0x70
[  222.850846]  ? __switch_to_asm+0x40/0x70
[  222.852581]  ? __switch_to_asm+0x34/0x70
[  222.854976]  ? __switch_to_asm+0x40/0x70
[  222.857475]  ? __switch_to_asm+0x40/0x70
[  222.859775]  ? __switch_to_asm+0x34/0x70
[  222.861218]  ? __switch_to_asm+0x40/0x70
[  222.862327]  ? __switch_to_asm+0x34/0x70
[  222.863758]  l2cap_recv_acldata+0x266/0x3c0 [bluetooth]
[  222.865122]  hci_rx_work+0x1c9/0x430 [bluetooth]
[  222.867144]  process_one_work+0x210/0x4c0
[  222.868248]  worker_thread+0x41/0x4d0
[  222.869420]  kthread+0x141/0x160
[  222.870694]  ? process_one_work+0x4c0/0x4c0
[  222.871668]  ? kthread_create_worker_on_cpu+0x90/0x90
[  222.872896]  ret_from_fork+0x35/0x40
[  222.874132] Modules linked in: algif_hash algif_skcipher af_alg rfcomm bnep btusb btrtl btbcm btintel snd_intel8x0 cmac intel_rapl_perf vboxvideo(C) snd_ac97_codec bluetooth ac97_bus joydev ttm snd_pcm ecdh_generic drm_kms_helper snd_timer snd input_leds drm serio_raw fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci psmouse libahci i2c_piix4 video e1000 pata_acpi
[  222.883153] fbcon_switch: detected unhandled fb_set_par error, error code -16
[  222.886774] fbcon_switch: detected unhandled fb_set_par error, error code -16
[  222.890503] ---[ end trace 6504aa7a777b5316 ]---
[  222.890541] RIP: 0010:compute_ecdh_secret+0x5a/0x270 [bluetooth]
[  222.890551] Code: 0c af f5 48 8b 3d 46 de f0 f6 ba 40 00 00 00 be c0 00 60 00 e8 b7 7b c5 f5 48 85 c0 0f 84 ea 01 00 00 48 89 c3 e8 16 0c af f5 <49> 8b 47 38 be c0 00 60 00 8b 78 f8 48 83 c7 48 e8 51 84 c5 f5 48
[  222.890555] RSP: 0018:ffffb1664087fbc0 EFLAGS: 00010293
[  222.890561] RAX: ffff8a5750d7dc00 RBX: ffff8a5671096780 RCX: ffffffffc08bc32a
[  222.890565] RDX: 0000000000000000 RSI: 00000000006000c0 RDI: ffff8a5752003800
[  222.890571] RBP: ffffb1664087fc60 R08: ffff8a57525280a0 R09: ffff8a5752003800
[  222.890576] R10: ffffb1664087fc70 R11: 0000000000000093 R12: ffff8a5674396e00
[  222.890581] R13: ffff8a574c2e79aa R14: ffff8a574c2e796a R15: 020e0e100d010101
[  222.890586] FS:  0000000000000000(0000) GS:ffff8a5752500000(0000) knlGS:0000000000000000
[  222.890591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.890594] CR2: 000055856016a038 CR3: 0000000110d2c005 CR4: 00000000000606e0

This commit fixes a bug where invalid pointer to crypto tfm was used for
SMP SC ECDH calculation when OOB was in use. Solution is to use same
crypto tfm than when generating OOB material on generate_oob() function.

This bug was introduced in commit c0153b0b90 ("Bluetooth: let the crypto
subsystem generate the ecc privkey"). Bug was found by fuzzing kernel SMP
implementation using Synopsys Defensics.

Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-11 13:33:57 +02:00
Johan Hedberg
94f14e4728 Bluetooth: SMP: Fix trying to use non-existent local OOB data
A remote device may claim that it has received our OOB data, even
though we never geneated it. Add a new flag to track whether we
actually have OOB data, and ignore the remote peer's flag if haven't
generated OOB data.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-11 13:33:57 +02:00
Steffen Klassert
9e14379378 xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
Since commit 222d7dbd25 ("net: prevent dst uses after free")
skb_dst_force() might clear the dst_entry attached to the skb.
The xfrm code don't expect this to happen, so we crash with
a NULL pointer dereference in this case. Fix it by checking
skb_dst(skb) for NULL after skb_dst_force() and drop the packet
in cast the dst_entry was cleared.

Fixes: 222d7dbd25 ("net: prevent dst uses after free")
Reported-by: Tobias Hommel <netdev-list@genoetigt.de>
Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-09-11 11:28:25 +02:00
Cong Wang
1286df269f netfilter: xt_hashlimit: use s->file instead of s->private
After switching to the new procfs API, it is supposed to
retrieve the private pointer from PDE_DATA(file_inode(s->file)),
s->private is no longer referred.

Fixes: 1cd6718272 ("netfilter/x_tables: switch to proc_create_seq_private")
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:35:32 +02:00
Michal 'vorner' Vaner
ad18d7bf68 netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
NF_REPEAT places the packet at the beginning of the iptables chain
instead of accepting or rejecting it right away. The packet however will
reach the end of the chain and continue to the end of iptables
eventually, so it needs the same handling as NF_ACCEPT and NF_DROP.

Fixes: 368982cd7d ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks")
Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-11 01:31:47 +02:00