Commit graph

44819 commits

Author SHA1 Message Date
Parthasarathy Bhuvaragan
57d5f64d83 tipc: allocate user memory with GFP_KERNEL flag
Until now, we allocate memory always with GFP_ATOMIC flag.
When the system is under memory pressure and a user tries to send,
the send fails due to low memory. However, the user application
can wait for free memory if we allocate it using GFP_KERNEL flag.

In this commit, we use allocate memory with GFP_KERNEL for all user
allocation.

Reported-by: Rune Torgersen <runet@innovsys.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 13:31:53 -05:00
Jakub Sitnicki
02ca0423fd ip6_tunnel: Account for tunnel header in tunnel MTU
With ip6gre we have a tunnel header which also makes the tunnel MTU
smaller. We need to reserve room for it. Previously we were using up
space reserved for the Tunnel Encapsulation Limit option
header (RFC 2473).

Also, after commit b05229f442 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") our contract with the caller has
changed. Now we check if the packet length exceeds the tunnel MTU after
the tunnel header has been pushed, unlike before.

This is reflected in the check where we look at the packet length minus
the size of the tunnel header, which is already accounted for in tunnel
MTU.

Fixes: b05229f442 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 13:22:12 -05:00
Hangbin Liu
1666d49e1d mld: do not remove mld souce list info when set link down
This is an IPv6 version of commit 24803f38a5 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.

Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().

Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 12:44:59 -05:00
Linus Torvalds
2eabb8b8d6 Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for older
bugs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYfOk6AAoJECebzXlCjuG+Lj4QALaLKRRbIdrz6nmg7gUmpTWc
 CdW8NMbzwSCXmYoivsTHBlhXZKsi5vVjnFXMCM/P85ddmipXdcTFCDLmmNoKUQ0M
 jODlLX90ctaZKCDBVSaH4htAz2gkFv7z5IllX0YDQqHyiuzh/9KoV+AFCgPZPTpL
 O1XRmfWz+yJDydz4hb3i5f2JvMk9P/tCXLnheuxxTIMSl2/fIfgF81eWwDpFqcA2
 27+PyWWjZehVnZ77ca/mWJj2n0+gBINiKafcfF39NK/Hv2q4aauB3k7c4blecc9Q
 m/IT3mKifvHvdNCmvHD5s74h4OikEGYpqaSjonMptZnWgfM4/gtF7yTiQjsOMDx/
 w6W/tfHlGrvegpzhjaIaoZZ50EZp7xwGNNZYgH4J44kytYpolrhsOR6NqCLTqpej
 xG2Kd89ZtnAgc/7T7ET/1PqpZ8f9M9pyV3E8s36OvF4AYQUNrfzbWSTQcZy3WGBP
 YuoUCzacIbNbGgu4m6Zx5l/vKW5yn45xbUMp7T9S4WoxYMx6a5vViU0NiF7KsQDu
 pcDT92DZ57KJFtCw7Ig08ILKsSXmNApH5/4mIrkX3quZuH4j2XapEJ9u//fmfZBd
 Q+Sgv8RXcGELUJIg9yfmoWgPDA/oYslc7ynBV0lXLNgBuod//dGSlZ+6KfFFJYr8
 XVOxwPTiiBIlc9lvB9eA
 =tb4L
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for
  older bugs"

* tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux:
  svcrdma: avoid duplicate dma unmapping during error recovery
  sunrpc: don't call sleeping functions from the notifier block callbacks
  svcrpc: don't leak contexts on PROC_DESTROY
  nfsd: fix supported attributes for acl & labels
2017-01-16 09:34:37 -08:00
William Breathitt Gray
e4670b058a netfilter: Fix typo in NF_CONNTRACK Kconfig option description
The NF_CONNTRACK Kconfig option description makes an incorrect reference
to the "meta" expression where the "ct" expression would be correct.This
patch fixes the respective typographical error.

Fixes: d497c63527 ("netfilter: add help information to new nf_tables Kconfig options")
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-16 14:23:02 +01:00
Liping Zhang
d21e540b4d netfilter: nf_tables: fix possible oops when dumping stateful objects
When dumping nft stateful objects, if NFTA_OBJ_TABLE and NFTA_OBJ_TYPE
attributes are not specified either, filter will become NULL, so oops
will happen(actually nft utility will always set NFTA_OBJ_TABLE attr,
so I write a test program to make this happen):

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: nf_tables_dump_obj+0x17c/0x330 [nf_tables]
  [...]
  Call Trace:
  ? nf_tables_dump_obj+0x5/0x330 [nf_tables]
  ? __kmalloc_reserve.isra.35+0x31/0x90
  ? __alloc_skb+0x5b/0x1e0
  netlink_dump+0x124/0x2a0
  __netlink_dump_start+0x161/0x190
  nf_tables_getobj+0xe8/0x280 [nf_tables]

Fixes: a9fea2a3c3 ("netfilter: nf_tables: allow to filter stateful object dumps by type")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-16 14:23:02 +01:00
Liping Zhang
6443ebc3fd netfilter: rpfilter: fix incorrect loopback packet judgment
Currently, we check the existing rtable in PREROUTING hook, if RTCF_LOCAL
is set, we assume that the packet is loopback.

But this assumption is incorrect, for example, a packet encapsulated
in ipsec transport mode was received and routed to local, after
decapsulation, it would be delivered to local again, and the rtable
was not dropped, so RTCF_LOCAL check would trigger. But actually, the
packet was not loopback.

So for these normal loopback packets, we can check whether the in device
is IFF_LOOPBACK or not. For these locally generated broadcast/multicast,
we can check whether the skb->pkt_type is PACKET_LOOPBACK or not.

Finally, there's a subtle difference between nft fib expr and xtables
rpfilter extension, user can add the following nft rule to do strict
rpfilter check:
  # nft add rule x y meta iif eth0 fib saddr . iif oif != eth0 drop

So when the packet is loopback, it's better to store the in device
instead of the LOOPBACK_IFINDEX, otherwise, after adding the above
nft rule, locally generated broad/multicast packets will be dropped
incorrectly.

Fixes: f83a7ea207 ("netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too")
Fixes: f6d0cbcf09 ("netfilter: nf_tables: add fib expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-16 14:23:01 +01:00
David S. Miller
1a717fcf8b We have a number of fixes, in part because I was late
in actually sending them out - will try to do better in
 the future:
  * handle VHT opmode properly when hostapd is controlling
    full station state
  * two fixes for minimum channel width in mac80211
  * don't leave SMPS set to junk in HT capabilities
  * fix headroom when forwarding mesh packets, recently
    broken by another fix that failed to take into account
    frame encryption
  * fix the TID in null-data packets indicating EOSP (end
    of service period) in U-APSD
  * prevent attempting to use (and then failing which
    results in crashes) TXQs on stations that aren't added
    to the driver yet
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYeNveAAoJEGt7eEactAAdCjgP+gIIUjpH08MazRE9Nl6gqRGW
 G6EAoaQDQ8AjiVo8nIJ33X2jmHRIPcyelyfMhdww3AtMJRrvl9wVhQvwKWI4+l6D
 o8UVMBu66fq2luQA6AtOkkU6SkKwC7Se72Mdx6OA/zvdZz4/9Y4nZpOFPVwCbxqL
 UJRzjrhnbS51YgL/Y0s0Vp+7Rv7IYCuH9JORNarMO5sYxaVpLWihJhbShX1bOshw
 uFTHOjNRseImLhM4GOvVA7fSUiK8jxEuMECmlDKQB/6nVxMskE54yLOqMB5Ys6va
 2CKjp5xqM4FfqB4LMNp7soJAXUOXvqk25JXAAbkVNo4VRd2Y4GoJ7XdBPqd4kfKb
 rPlr+UP2xaSQsfqIoe+uFr/lVUmm4oCTrS/Mo7YVrjSMU7fntYwZccr9aw5jrQFW
 YU+1QTF25HEb1SL18FH9JNWpoTlOlpB3bwpbAW4BHzsqDYc76CE/oyDLdZ6zQYlu
 z92cLuycGTwgNhi1zDNThxB81zhZsWH1Jbh9ppZBdDPxo6E4DMK71okreGAXBEMJ
 IdFZZqJbYW4I/TsUtse4atozk6oXlTEFY/pX4Qm0gwGwmqQx+wfSNLbymsD7gR42
 OkL+bZ701HBl2gJUWuICrM2lFWtD4/o6oWpaW2I7QUAQhjvUr4ld9kthVMF2vdlt
 w316EIfZQzBKw7Z7zjZI
 =GU6q
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
We have a number of fixes, in part because I was late
in actually sending them out - will try to do better in
the future:
 * handle VHT opmode properly when hostapd is controlling
   full station state
 * two fixes for minimum channel width in mac80211
 * don't leave SMPS set to junk in HT capabilities
 * fix headroom when forwarding mesh packets, recently
   broken by another fix that failed to take into account
   frame encryption
 * fix the TID in null-data packets indicating EOSP (end
   of service period) in U-APSD
 * prevent attempting to use (and then failing which
   results in crashes) TXQs on stations that aren't added
   to the driver yet
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-15 22:17:59 -05:00
Lance Richardson
75f01a4c9c openvswitch: maintain correct checksum state in conntrack actions
When executing conntrack actions on skbuffs with checksum mode
CHECKSUM_COMPLETE, the checksum must be updated to account for
header pushes and pulls. Otherwise we get "hw csum failure"
logs similar to this (ICMP packet received on geneve tunnel
via ixgbe NIC):

[  405.740065] genev_sys_6081: hw csum failure
[  405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G          I     4.10.0-rc3+ #1
[  405.740108] Call Trace:
[  405.740110]  <IRQ>
[  405.740113]  dump_stack+0x63/0x87
[  405.740116]  netdev_rx_csum_fault+0x3a/0x40
[  405.740118]  __skb_checksum_complete+0xcf/0xe0
[  405.740120]  nf_ip_checksum+0xc8/0xf0
[  405.740124]  icmp_error+0x1de/0x351 [nf_conntrack_ipv4]
[  405.740132]  nf_conntrack_in+0xe1/0x550 [nf_conntrack]
[  405.740137]  ? find_bucket.isra.2+0x62/0x70 [openvswitch]
[  405.740143]  __ovs_ct_lookup+0x95/0x980 [openvswitch]
[  405.740145]  ? netif_rx_internal+0x44/0x110
[  405.740149]  ovs_ct_execute+0x147/0x4b0 [openvswitch]
[  405.740153]  do_execute_actions+0x22e/0xa70 [openvswitch]
[  405.740157]  ovs_execute_actions+0x40/0x120 [openvswitch]
[  405.740161]  ovs_dp_process_packet+0x84/0x120 [openvswitch]
[  405.740166]  ovs_vport_receive+0x73/0xd0 [openvswitch]
[  405.740168]  ? udp_rcv+0x1a/0x20
[  405.740170]  ? ip_local_deliver_finish+0x93/0x1e0
[  405.740172]  ? ip_local_deliver+0x6f/0xe0
[  405.740174]  ? ip_rcv_finish+0x3a0/0x3a0
[  405.740176]  ? ip_rcv_finish+0xdb/0x3a0
[  405.740177]  ? ip_rcv+0x2a7/0x400
[  405.740180]  ? __netif_receive_skb_core+0x970/0xa00
[  405.740185]  netdev_frame_hook+0xd3/0x160 [openvswitch]
[  405.740187]  __netif_receive_skb_core+0x1dc/0xa00
[  405.740194]  ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe]
[  405.740197]  __netif_receive_skb+0x18/0x60
[  405.740199]  netif_receive_skb_internal+0x40/0xb0
[  405.740201]  napi_gro_receive+0xcd/0x120
[  405.740204]  gro_cell_poll+0x57/0x80 [geneve]
[  405.740206]  net_rx_action+0x260/0x3c0
[  405.740209]  __do_softirq+0xc9/0x28c
[  405.740211]  irq_exit+0xd9/0xf0
[  405.740213]  do_IRQ+0x51/0xd0
[  405.740215]  common_interrupt+0x93/0x93

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-15 22:15:16 -05:00
Shannon Nelson
003c941057 tcp: fix tcp_fastopen unaligned access complaints on sparc
Fix up a data alignment issue on sparc by swapping the order
of the cookie byte array field with the length field in
struct tcp_fastopen_cookie, and making it a proper union
to clean up the typecasting.

This addresses log complaints like these:
    log_unaligned: 113 callbacks suppressed
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
    Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
    Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
    Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 12:31:24 -05:00
David Lebrun
fa79581ea6 ipv6: sr: fix several BUGs when preemption is enabled
When CONFIG_PREEMPT=y, CONFIG_IPV6=m and CONFIG_SEG6_HMAC=y,
seg6_hmac_init() is called during the initialization of the ipv6 module.
This causes a subsequent call to smp_processor_id() with preemption
enabled, resulting in the following trace.

[   20.451460] BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1
[   20.452556] caller is debug_smp_processor_id+0x17/0x19
[   20.453304] CPU: 0 PID: 1 Comm: systemd Not tainted 4.9.0-rc5-00973-g46738b1 #1
[   20.454406]  ffffc9000062fc18 ffffffff813607b2 0000000000000000 ffffffff81a7f782
[   20.455528]  ffffc9000062fc48 ffffffff813778dc 0000000000000000 00000000001dcf98
[   20.456539]  ffffffffa003bd08 ffffffff81af93e0 ffffc9000062fc58 ffffffff81377905
[   20.456539] Call Trace:
[   20.456539]  [<ffffffff813607b2>] dump_stack+0x63/0x7f
[   20.456539]  [<ffffffff813778dc>] check_preemption_disabled+0xd1/0xe3
[   20.456539]  [<ffffffff81377905>] debug_smp_processor_id+0x17/0x19
[   20.460260]  [<ffffffffa0061f3b>] seg6_hmac_init+0xfa/0x192 [ipv6]
[   20.460260]  [<ffffffffa0061ccc>] seg6_init+0x39/0x6f [ipv6]
[   20.460260]  [<ffffffffa006121a>] inet6_init+0x21a/0x321 [ipv6]
[   20.460260]  [<ffffffffa0061000>] ? 0xffffffffa0061000
[   20.460260]  [<ffffffff81000457>] do_one_initcall+0x8b/0x115
[   20.460260]  [<ffffffff811328a3>] do_init_module+0x53/0x1c4
[   20.460260]  [<ffffffff8110650a>] load_module+0x1153/0x14ec
[   20.460260]  [<ffffffff81106a7b>] SYSC_finit_module+0x8c/0xb9
[   20.460260]  [<ffffffff81106a7b>] ? SYSC_finit_module+0x8c/0xb9
[   20.460260]  [<ffffffff81106abc>] SyS_finit_module+0x9/0xb
[   20.460260]  [<ffffffff810014d1>] do_syscall_64+0x62/0x75
[   20.460260]  [<ffffffff816834f0>] entry_SYSCALL64_slow_path+0x25/0x25

Moreover, dst_cache_* functions also call smp_processor_id(), generating
a similar trace.

This patch uses raw_cpu_ptr() in seg6_hmac_init() rather than this_cpu_ptr()
and disable preemption when using dst_cache_* functions.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 12:29:55 -05:00
Michal Kazior
dbef536211 mac80211: prevent skb/txq mismatch
Station structure is considered as not uploaded
(to driver) until drv_sta_state() finishes. This
call is however done after the structure is
attached to mac80211 internal lists and hashes.
This means mac80211 can lookup (and use) station
structure before it is uploaded to a driver.

If this happens (structure exists, but
sta->uploaded is false) fast_tx path can still be
taken. Deep in the fastpath call the sta->uploaded
is checked against to derive "pubsta" argument for
ieee80211_get_txq(). If sta->uploaded is false
(and sta is actually non-NULL) ieee80211_get_txq()
effectively downgraded to vif->txq.

At first glance this may look innocent but coerces
mac80211 into a state that is almost guaranteed
(codel may drop offending skb) to crash because a
station-oriented skb gets queued up on
vif-oriented txq. The ieee80211_tx_dequeue() ends
up looking at info->control.flags and tries to use
txq->sta which in the fail case is NULL.

It's probably pointless to pretend one can
downgrade skb from sta-txq to vif-txq.

Since downgrading unicast traffic to vif->txq must
not be done there's no txq to put a frame on if
sta->uploaded is false. Therefore the code is made
to fall back to regular tx() op path if the
described condition is hit.

Only drivers using wake_tx_queue were affected.

Example crash dump before fix:

 Unable to handle kernel paging request at virtual address ffffe26c
 PC is at ieee80211_tx_dequeue+0x204/0x690 [mac80211]
 [<bf4252a4>] (ieee80211_tx_dequeue [mac80211]) from
 [<bf4b1388>] (ath10k_mac_tx_push_txq+0x54/0x1c0 [ath10k_core])
 [<bf4b1388>] (ath10k_mac_tx_push_txq [ath10k_core]) from
 [<bf4bdfbc>] (ath10k_htt_txrx_compl_task+0xd78/0x11d0 [ath10k_core])
 [<bf4bdfbc>] (ath10k_htt_txrx_compl_task [ath10k_core])
 [<bf51c5a4>] (ath10k_pci_napi_poll+0x54/0xe8 [ath10k_pci])
 [<bf51c5a4>] (ath10k_pci_napi_poll [ath10k_pci]) from
 [<c0572e90>] (net_rx_action+0xac/0x160)

Reported-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 14:47:21 +01:00
Felix Fietkau
43071d8fb3 mac80211: initialize SMPS field in HT capabilities
ibss and mesh modes copy the ht capabilites from the band without
overriding the SMPS state. Unfortunately the default value 0 for the
SMPS field means static SMPS instead of disabled.

This results in HT ibss and mesh setups using only single-stream rates,
even though SMPS is not supposed to be active.

Initialize SMPS to disabled for all bands on ieee80211_hw_register to
ensure that the value is sane where it is not overriden with the real
SMPS state.

Reported-by: Elektra Wagenrad <onelektra@gmx.net>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[move VHT TODO comment to a better place]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 11:31:26 +01:00
Sriharsha Basavapatna
ce1ca7d2d1 svcrdma: avoid duplicate dma unmapping during error recovery
In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
problems when the iova being unmapped is subsequently reused. Remove
the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
handle it.

Fixes: 412a15c0fe ("svcrdma: Port to new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12 16:14:47 -05:00
Scott Mayhew
546125d161 sunrpc: don't call sleeping functions from the notifier block callbacks
The inet6addr_chain is an atomic notifier chain, so we can't call
anything that might sleep (like lock_sock)... instead of closing the
socket from svc_age_temp_xprts_now (which is called by the notifier
function), just have the rpc service threads do it instead.

Cc: stable@vger.kernel.org
Fixes: c3d4879e01 "sunrpc: Add a function to close..."
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12 15:56:40 -05:00
J. Bruce Fields
78794d1890 svcrpc: don't leak contexts on PROC_DESTROY
Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future.  This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests.  We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Cc: stable@vger.kernel.org
Fixes: c5b29f885a "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12 15:56:14 -05:00
David Ahern
8a430ed50b net: ipv4: fix table id in getroute response
rtm_table is an 8-bit field while table ids are allowed up to u32. Commit
709772e6e0 ("net: Fix routing tables with id > 255 for legacy software")
added the preference to set rtm_table in dumps to RT_TABLE_COMPAT if the
table id is > 255. The table id returned on get route requests should do
the same.

Fixes: c36ba6603a ("net: Allow user to get table id from route lookup")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 15:18:20 -05:00
David Ahern
ea7a80858f net: lwtunnel: Handle lwtunnel_fill_encap failure
Handle failure in lwtunnel_fill_encap adding attributes to skb.

Fixes: 571e722676 ("ipv4: support for fib route lwtunnel encap attributes")
Fixes: 19e42e4515 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 15:11:43 -05:00
Linus Torvalds
ba836a6f5a Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "27 fixes.

  There are three patches that aren't actually fixes. They're simple
  function renamings which are nice-to-have in mainline as ongoing net
  development depends on them."

* akpm: (27 commits)
  timerfd: export defines to userspace
  mm/hugetlb.c: fix reservation race when freeing surplus pages
  mm/slab.c: fix SLAB freelist randomization duplicate entries
  zram: support BDI_CAP_STABLE_WRITES
  zram: revalidate disk under init_lock
  mm: support anonymous stable page
  mm: add documentation for page fragment APIs
  mm: rename __page_frag functions to __page_frag_cache, drop order from drain
  mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
  mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
  mm: don't dereference struct page fields of invalid pages
  mailmap: add codeaurora.org names for nameless email commits
  signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
  mm: pmd dirty emulation in page fault handler
  ipc/sem.c: fix incorrect sem_lock pairing
  lib/Kconfig.debug: fix frv build failure
  mm: get rid of __GFP_OTHER_NODE
  mm: fix remote numa hits statistics
  mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
  ocfs2: fix crash caused by stale lvb with fsdlm plugin
  ...
2017-01-11 11:15:15 -08:00
Johannes Berg
d2941df8fb mac80211: recalculate min channel width on VHT opmode changes
When an associated station changes its VHT operating mode this
can/will affect the bandwidth it's using, and consequently we
must recalculate the minimum bandwidth we need to use. Failure
to do so can lead to one of two scenarios:
 1) we use a too high bandwidth, this is benign
 2) we use a too narrow bandwidth, causing rate control and
    actual PHY configuration to be out of sync, which can in
    turn cause problems/crashes

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:35:05 +01:00
Johannes Berg
96aa2e7cf1 mac80211: calculate min channel width correctly
In the current minimum chandef code there's an issue in that the
recalculation can happen after rate control is initialized for a
station that has a wider bandwidth than the current chanctx, and
then rate control can immediately start using those higher rates
which could cause problems.

Observe that first of all that this problem is because we don't
take non-associated and non-uploaded stations into account. The
restriction to non-associated is quite pointless and is one of
the causes for the problem described above, since the rate init
will happen before the station is set to associated; no frames
could actually be sent until associated, but the rate table can
already contain higher rates and that might cause problems.

Also, rejecting non-uploaded stations is wrong, since the rate
control can select higher rates for those as well.

Secondly, it's then necessary to recalculate the minimal config
before initializing rate control, so that when rate control is
initialized, the higher rates are already available. This can be
done easily by adding the necessary function call in rate init.

Change-Id: Ib9bc02d34797078db55459d196993f39dcd43070
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:34:51 +01:00
Beni Lev
06f7c88c10 cfg80211: consider VHT opmode on station update
Currently, this attribute is only fetched on station addition, but
not on station change. Since this info is only present in the assoc
request, with full station state support in the driver it cannot be
present when the station is added.

Thus, add support for changing the VHT opmode on station update if
done before (or while) the station is marked as associated. After
this, ignore it, since it used to be ignored.

Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:34:25 +01:00
Emmanuel Grumbach
d7f842442f mac80211: fix the TID on NDPs sent as EOSP carrier
In the commit below, I forgot to translate the mac80211's
AC to QoS IE order. Moreover, the condition in the if was
wrong. Fix both issues.
This bug would hit only with clients that didn't set all
the ACs as delivery enabled.

Fixes: f438ceb81d ("mac80211: uapsd_queues is in QoS IE order")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:32:58 +01:00
Cedric Izoard
c38c39bf7c mac80211: Fix headroom allocation when forwarding mesh pkt
This patch fix issue introduced by my previous commit that
tried to ensure enough headroom was present, and instead
broke it.

When forwarding mesh pkt, mac80211 may also add security header,
and it must therefore be taken into account in the needed headroom.

Fixes: d8da0b5d64 ("mac80211: Ensure enough headroom when forwarding mesh pkt")
Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:07:29 +01:00
Colin Ian King
eb004603c8 sctp: Fix spelling mistake: "Atempt" -> "Attempt"
Trivial fix to spelling mistake in WARN_ONCE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 10:01:01 -05:00
David Ahern
7a18c5b9fb net: ipv4: Fix multipath selection with vrf
fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b30f ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:59:55 -05:00
Arnd Bergmann
73b3514735 cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:

warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)

I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.

Fixes: 483c4933ea ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:47:10 -05:00
Eric Dumazet
7cfd5fd5a9 gro: use min_t() in skb_gro_reset_offset()
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 08:15:40 -05:00
Alexander Duyck
8c2dd3e4a4 mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent.  First
we move the page_frag_ portion of the name to the front of the functions
names.  Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this.  I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs.  Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix.  This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Herbert Xu
57ea52a865 gro: Disable frag0 optimization on IPv6 ext headers
The GRO fast path caches the frag0 address.  This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.

This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.

This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.

Fixes: 78a478d0ef ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:30:33 -05:00
Herbert Xu
1272ce87fa gro: Enter slow-path if there is no tailroom
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978cbf ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:26:12 -05:00
Julian Wiedmann
dc5367bcc5 net/af_iucv: don't use paged skbs for TX on HiperSockets
With commit e53743994e
("af_iucv: use paged SKBs for big outbound messages"),
we transmit paged skbs for both of AF_IUCV's transport modes
(IUCV or HiperSockets).
The qeth driver for Layer 3 HiperSockets currently doesn't
support NETIF_F_SG, so these skbs would just be linearized again
by the stack.
Avoid that overhead by using paged skbs only for IUCV transport.

cc stable, since this also circumvents a significant skb leak when
sending large messages (where the skb then needs to be linearized).

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: e53743994e ("af_iucv: use paged SKBs for big outbound messages")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:08:29 -05:00
Anna, Suman
5d722b3024 net: add the AF_QIPCRTR entries to family name tables
Commit bdabad3e36 ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.

Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:50:59 -05:00
Stephen Boyd
3512a1ad56 net: qrtr: Mark 'buf' as little endian
Failure to mark this pointer as __le32 causes checkers like
sparse to complain:

net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:274:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:274:16:    got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:275:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:275:16:    got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:276:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:276:16:    got restricted __le32 [usertype] <noident>

Silence it.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:45:04 -05:00
Florian Fainelli
faf3a932fb net: dsa: Ensure validity of dst->ds[0]
It is perfectly possible to have non zero indexed switches being present
in a DSA switch tree, in such a case, we will be deferencing a NULL
pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
and ensure that dst->ds[0] is valid before doing anything with it.

Fixes: 0c73c523cf ("net: dsa: Initialize CPU port ethtool ops per tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:18:52 -05:00
Eric Dumazet
d9584d8ccc net: skb_flow_get_be16() can be static
Removes following sparse complain :

net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?

Fixes: 972d3876fa ("flow dissector: ICMP support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 13:30:13 -05:00
Tobias Klauser
dc647ec88e net: socket: Make unnecessarily global sockfs_setattr() static
Make sockfs_setattr() static as it is not used outside of net/socket.c

This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 11:29:50 -05:00
Eric Dumazet
6bb629db5e tcp: do not export tcp_peer_is_proven()
After commit 1fb6f159fd ("tcp: add tcp_conn_request"),
tcp_peer_is_proven() no longer needs to be exported.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:39 -05:00
Pavel Tikhomirov
b007f09072 ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:38 -05:00
Alexander Alemayhu
67c408cfa8 ipv6: fix typos
o s/approriate/appropriate
o s/discouvery/discovery

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:15 -05:00
Pau Espin Pedrol
bf99b4ded5 tcp: fix mark propagation with fwmark_reflect enabled
Otherwise, RST packets generated by the TCP stack for non-existing
sockets always have mark 0.
The mark from the original packet is assigned to the netns_ipv4/6
socket used to send the response so that it can get copied into the
response skb when the socket sends it.

Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 18:01:03 +01:00
Pau Espin Pedrol
cc31d43b41 netfilter: use fwmark_reflect in nf_send_reset
Otherwise, RST packets generated by ipt_REJECT always have mark 0 when
the routing is checked later in the same code path.

Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 18:01:03 +01:00
Johannes Berg
eeb0d56fab mac80211: implement multicast forwarding on fast-RX path
In AP (or VLAN) mode, when unicast 802.11 packets are received,
they might actually be multicast after conversion. In this case
the fast-RX path didn't handle them properly to send them back
to the wireless medium. Implement that by copying the SKB and
sending it back out.

The possible alternative would be to just punt the packet back
to the regular (slow) RX path, but since we have almost all of
the required code here already it's not so complicated to add
here. Punting it back would also mean acquiring the spinlock,
which would be bad for the stated purpose of the fast-RX path,
to enable well-performing parallel RX.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-09 11:12:04 +01:00
Paul Moore
bcd5e1a49f netlabel: add CALIPSO to the list of built-in protocols
When we added CALIPSO support in Linux v4.8 we forgot to add it to the
list of supported protocols with display at boot.

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:20:45 -05:00
David S. Miller
219a808fa1 Another single fix, to correctly handle destruction of a
single netlink socket having ownership of multiple objects
 (scheduled scan requests and interfaces.)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYb47lAAoJEGt7eEactAAdGg0QAJBevWlObLIDrFJIzoq7caPc
 UV9/QqNCi8RzU0FrJBMn8xOWvTTEiL5WNLm2SstKg4SQNYCfn0DNTw8GTfeQiAtr
 ppbkDm+sEyANVOEneyMggEJlD1C0h3t8M7z6qYWJRuHndBrLzS/Safv44YNhQOeg
 dkOwwlAgc5VHxOB2pFfAWS6N+F0l+uqLE30NomOVcNZv19PRZgPq5VA9IqrXbNMl
 r9/cDYp9d3tSPB/LGd6rQLT6t4bTDKiYKtdxEkrBf8ceFLMhCc/IAzv+o92cSx6H
 r5ZrJ08rUgcJfgtZi88EK7phF+tQjR94pBnvKfyH9afsyLYzTNrdBrUuugTkKnVi
 J4oN3Sc/rhw3yagA1uXDzNXj6RtMKw+c8UUYuGNt+1nv9Hx6ZeDoVpT+Ms21JNTz
 ejSc10I+E5GjjBOSj+w8r0ofgDlK1B0+S3cs5M4mwS8d0W0ibDcanaru5DIwfnd2
 w1UJiTLQHnFG4Gp9LdoNlF1lXPfaQlXbXeXc0OlmQhnBL82lqbR14ast1dRk1HXi
 LpEqJFgToUmiVemjYsepI9//gK4vz3BCokOnjtnMtxVEUyv1Ud2eDPHx0DrKu+Oq
 JAJY4q6g8R2k8vRL+eupFIEeZTyz++7Yh2YsWv26EgB2mNvE1LCVBPeUg30wrX6+
 BcyYx4lgXlkeHH0kxnVA
 =dUOq
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Another single fix, to correctly handle destruction of a
single netlink socket having ownership of multiple objects
(scheduled scan requests and interfaces.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 16:26:19 -05:00
David Forster
93e246f783 vti6: fix device register to report IFLA_INFO_KIND
vti6 interface is registered before the rtnl_link_ops block
is attached. As a result the resulting RTM_NEWLINK is missing
IFLA_INFO_KIND. Re-order attachment of rtnl_link_ops block to fix.

Signed-off-by: Dave Forster <dforster@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 16:09:09 -05:00
David S. Miller
d896b3120b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains accumulated Netfilter fixes for your
net tree:

1) Ensure quota dump and reset happens iff we can deliver numbers to
   userspace.

2) Silence splat on incorrect use of smp_processor_id() from nft_queue.

3) Fix an out-of-bound access reported by KASAN in
   nf_tables_rule_destroy(), patch from Florian Westphal.

4) Fix layer 4 checksum mangling in the nf_tables payload expression
   with IPv6.

5) Fix a race in the CLUSTERIP target from control plane path when two
   threads run to add a new configuration object. Serialize invocations
   of clusterip_config_init() using spin_lock. From Xin Long.

6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with
   the br_nf_pre_routing_finish() hook. From Artur Molchanov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-05 11:49:57 -05:00
Johannes Berg
753aacfd2e nl80211: fix sched scan netlink socket owner destruction
A single netlink socket might own multiple interfaces *and* a
scheduled scan request (which might belong to another interface),
so when it goes away both may need to be destroyed.

Remove the schedule_scan_stop indirection to fix this - it's only
needed for interface destruction because of the way this works
right now, with a single work taking care of all interfaces.

Cc: stable@vger.kernel.org
Fixes: 93a1e86ce1 ("nl80211: Stop scheduled scan if netlink client disappears")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-05 10:59:53 +01:00
Sven Eckelmann
4ea33ef0f9 batman-adv: Decrease hardif refcnt on fragmentation send error
An error before the hardif is found has to free the skb. But every error
after that has to free the skb + put the hard interface.

Fixes: 8def0be82d ("batman-adv: Consume skb in batadv_frag_send_packet")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-04 08:22:04 +01:00
Reiter Wolfgang
3b48ab2248 drop_monitor: consider inserted data in genlmsg_end
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.

This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:09:44 -05:00
Alexander Duyck
5350d54f6c ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules
In the case of custom rules being present we need to handle the case of the
LOCAL table being intialized after the new rule has been added.  To address
that I am adding a new check so that we can make certain we don't use an
alias of MAIN for LOCAL when allocating a new table.

Fixes: 0ddcf43d5d ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Oliver Brunel <jjk@jjacky.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 09:38:34 -05:00
Michal Tesar
7ababb7826 igmp: Make igmp group member RFC 3376 compliant
5.2. Action on Reception of a Query

 When a system receives a Query, it does not respond immediately.
 Instead, it delays its response by a random amount of time, bounded
 by the Max Resp Time value derived from the Max Resp Code in the
 received Query message.  A system may receive a variety of Queries on
 different interfaces and of different kinds (e.g., General Queries,
 Group-Specific Queries, and Group-and-Source-Specific Queries), each
 of which may require its own delayed response.

 Before scheduling a response to a Query, the system must first
 consider previously scheduled pending responses and in many cases
 schedule a combined response.  Therefore, the system must be able to
 maintain the following state:

 o A timer per interface for scheduling responses to General Queries.

 o A per-group and interface timer for scheduling responses to Group-
   Specific and Group-and-Source-Specific Queries.

 o A per-group and interface list of sources to be reported in the
   response to a Group-and-Source-Specific Query.

 When a new Query with the Router-Alert option arrives on an
 interface, provided the system has state to report, a delay for a
 response is randomly selected in the range (0, [Max Resp Time]) where
 Max Resp Time is derived from Max Resp Code in the received Query
 message.  The following rules are then used to determine if a Report
 needs to be scheduled and the type of Report to schedule.  The rules
 are considered in order and only the first matching rule is applied.

 1. If there is a pending response to a previous General Query
    scheduled sooner than the selected delay, no additional response
    needs to be scheduled.

 2. If the received Query is a General Query, the interface timer is
    used to schedule a response to the General Query after the
    selected delay.  Any previously pending response to a General
    Query is canceled.
--8<--

Currently the timer is rearmed with new random expiration time for
every incoming query regardless of possibly already pending report.
Which is not aligned with the above RFE.
It also might happen that higher rate of incoming queries can
postpone the report after the expiration time of the first query
causing group membership loss.

Now the per interface general query timer is rearmed only
when there is no pending report already scheduled on that interface or
the newly selected expiration time is before the already pending
scheduled report.

Signed-off-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 13:01:03 -05:00
Ian Kumlien
d0af683407 flow_dissector: Update pptp handling to avoid null pointer deref.
__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to __skb_header_pointer
except the pptp handling which is still calling skb_header_pointer.

skb_header_pointer will use skb->data and thus:
[  109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[  109.557102] IP: [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.557263] PGD 0
[  109.557338]
[  109.557484] Oops: 0000 [#1] SMP
[  109.557562] Modules linked in: chaoskey
[  109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
[  109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
[  109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
[  109.558041] RIP: 0010:[<ffffffff88dc02f8>]  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.558203] RSP: 0018:ffff94087fc83d40  EFLAGS: 00010206
[  109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
[  109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
[  109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
[  109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
[  109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
[  109.558979] FS:  0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
[  109.559326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
[  109.559753] Stack:
[  109.559957]  000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
[  109.560578]  0000000000000001 0000000000002000 0000000000000000 0000000000000000
[  109.561200]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  109.561820] Call Trace:
[  109.562027]  <IRQ>
[  109.562108]  [<ffffffff88dfb4fa>] ? eth_get_headlen+0x7a/0xf0
[  109.562522]  [<ffffffff88c5a35a>] ? igb_poll+0x96a/0xe80
[  109.562737]  [<ffffffff88dc912b>] ? net_rx_action+0x20b/0x350
[  109.562953]  [<ffffffff88546d68>] ? __do_softirq+0xe8/0x280
[  109.563169]  [<ffffffff8854704a>] ? irq_exit+0xaa/0xb0
[  109.563382]  [<ffffffff8847229b>] ? do_IRQ+0x4b/0xc0
[  109.563597]  [<ffffffff8902d4ff>] ? common_interrupt+0x7f/0x7f
[  109.563810]  <EOI>
[  109.563890]  [<ffffffff88d57530>] ? cpuidle_enter_state+0x130/0x2c0
[  109.564304]  [<ffffffff88d57520>] ? cpuidle_enter_state+0x120/0x2c0
[  109.564520]  [<ffffffff8857eacf>] ? cpu_startup_entry+0x19f/0x1f0
[  109.564737]  [<ffffffff8848d55a>] ? start_secondary+0x12a/0x140
[  109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
04 66 85 c0 <41> 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
24 84
[  109.569959] RIP  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.570245]  RSP <ffff94087fc83d40>
[  109.570453] CR2: 0000000000000080

Fixes: ab10dccb11 ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 12:53:34 -05:00
Johannes Berg
35f432a03e mac80211: initialize fast-xmit 'info' later
In ieee80211_xmit_fast(), 'info' is initialized to point to the skb
that's passed in, but that skb may later be replaced by a clone (if
it was shared), leading to an invalid pointer.

This can lead to use-after-free and also later crashes since the
real SKB's info->hw_queue doesn't get initialized properly.

Fix this by assigning info only later, when it's needed, after the
skb replacement (may have) happened.

Cc: stable@vger.kernel.org
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-02 11:28:25 +01:00
Guillaume Nault
a9b2dff80b l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups
For connected sockets, __l2tp_ip{,6}_bind_lookup() needs to check the
remote IP when looking for a matching socket. Otherwise a connected
socket can receive traffic not originating from its peer.

Drop l2tp_ip_bind_lookup() and l2tp_ip6_bind_lookup() instead of
updating their prototype, as these functions aren't used.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 22:07:20 -05:00
Guillaume Nault
97b84fd6d9 l2tp: consider '::' as wildcard address in l2tp_ip6 socket lookup
An L2TP socket bound to the unspecified address should match with any
address. If not, it can't receive any packet and __l2tp_ip6_bind_lookup()
can't prevent another socket from binding on the same device/tunnel ID.

While there, rename the 'addr' variable to 'sk_laddr' (local addr), to
make following patch clearer.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 22:07:20 -05:00
Reiter Wolfgang
4200462d88 drop_monitor: add missing call to genlmsg_end
Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 22:00:26 -05:00
Eric Biggers
e1a3a60a2e net: socket: don't set sk_uid to garbage value in ->setattr()
->setattr() was recently implemented for socket files to sync the socket
inode's uid to the new 'sk_uid' member of struct sock.  It does this by
copying over the ia_uid member of struct iattr.  However, ia_uid is
actually only valid when ATTR_UID is set in ia_valid, indicating that
the uid is being changed, e.g. by chown.  Other metadata operations such
as chmod or utimes leave ia_uid uninitialized.  Therefore, sk_uid could
be set to a "garbage" value from the stack.

Fix this by only copying the uid over when ATTR_UID is set.

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 11:53:34 -05:00
Artur Molchanov
14221cc45c bridge: netfilter: Fix dropping packets that moving through bridge interface
Problem:
br_nf_pre_routing_finish() calls itself instead of
br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops
packets that go through bridge interface.

User impact:
Local docker containers with bridge network can not communicate with each
other.

Fixes: c5136b15ea ("netfilter: bridge: add and use br_nf_hook_thresh")
Signed-off-by: Artur Molchanov <artur.molchanov@synesis.ru>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-30 18:22:50 +01:00
David Ahern
f5a0aab84b net: ipv4: dst for local input routes should use l3mdev if relevant
IPv4 output routes already use l3mdev device instead of loopback for dst's
if it is applicable. Change local input routes to do the same.

This fixes icmp responses for unreachable UDP ports which are directed
to the wrong table after commit 9d1a6c4ea4 because local_input
routes use the loopback device. Moving from ingress device to loopback
loses the L3 domain causing responses based on the dst to get to lost.

Fixes: 9d1a6c4ea4 ("net: icmp_route_lookup should use rt dev to
		       determine L3 domain")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 22:27:23 -05:00
Wei Zhang
f0c16ba893 net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a82 ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a82 ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka.com@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 14:07:41 -05:00
Mathias Krause
4775cc1f2d rtnl: stats - add missing netlink message size checks
We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.

Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.

Fixes: 10c9ead9f3 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 14:05:15 -05:00
Zheng Li
e4c5e13aa4 ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
There is an inconsistent conditional judgement between __ip6_append_data
and ip6_finish_output functions, the variable length in __ip6_append_data
just include the length of application's payload and udp6 header, don't
include the length of ipv6 header, but in ip6_finish_output use
(skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ipv6 header.

That causes some particular application's udp6 payloads whose length are
between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even
though the rst->dev support UFO feature.

Add the length of ipv6 header to length in __ip6_append_data to keep
consistent conditional judgement as ip6_finish_output for ip6 fragment.

Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 11:55:17 -05:00
Augusto Mecking Caringi
9dd0f896d2 net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FS
This patch fixes the following warnings when CONFIG_PROC_FS is not set:

linux/net/atm/lec.c: In function ‘lane_module_cleanup’:
linux/net/atm/lec.c:1062:27: error: ‘atm_proc_root’ undeclared (first
use in this function)
remove_proc_entry("lec", atm_proc_root);
                           ^
linux/net/atm/lec.c:1062:27: note: each undeclared identifier is
reported only once for each function it appears in

Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 15:11:32 -05:00
Paul Blakey
0df0f207aa net/sched: cls_flower: Fix missing addr_type in classify
Since we now use a non zero mask on addr_type, we are matching on its
value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
failed in SW/classify path since its value was zero.
This patch sets the proper value of addr_type for encapsulated packets.

Fixes: 970bfcd097 ('net/sched: cls_flower: Use mask for addr_type')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:28:13 -05:00
Linus Torvalds
8f18e4d03e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

 2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

 3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: stmmac: fix incorrect bit set in gmac4 mdio addr register
  r8169: add support for RTL8168 series add-on card.
  net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
  openvswitch: upcall: Fix vlan handling.
  ipv4: Namespaceify tcp_tw_reuse knob
  net: korina: Fix NAPI versus resources freeing
  net, sched: fix soft lockup in tc_classify
  net/mlx4_en: Fix user prio field in XDP forward
  tipc: don't send FIN message from connectionless socket
  ipvlan: fix multicast processing
  ipvlan: fix various issues in ipvlan_process_multicast()
2016-12-27 16:04:37 -08:00
Jason Wang
be26727772 net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
After commit 73b62bd085 ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c30.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27 12:28:07 -05:00
pravin shelar
df30f7408b openvswitch: upcall: Fix vlan handling.
Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme <jarno@ovn.org>
CC: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27 12:28:07 -05:00
Haishuang Yan
56ab6b9300 ipv4: Namespaceify tcp_tw_reuse knob
Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27 12:28:07 -05:00
Daniel Borkmann
628185cfdd net, sched: fix soft lockup in tc_classify
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-26 11:24:10 -05:00
Thomas Gleixner
1f3a8e49d8 ktime: Get rid of ktime_equal()
No point in going through loops and hoops instead of just comparing the
values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:23 +01:00
Thomas Gleixner
8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner
2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Jon Paul Maloy
693c56491f tipc: don't send FIN message from connectionless socket
In commit 6f00089c73 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.

We fix this in this commit.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 17:53:47 -05:00
Marcelo Ricardo Leitner
1636098c46 sctp: fix recovering from 0 win with small data chunks
Currently if SCTP closes the receive window with window pressure, mostly
caused by excessive skb overhead on payload/overheads ratio, SCTP will
close the window abruptly while saving the delta on rwnd_press. It will
start recovering rwnd as the chunks are consumed by the application and
the rwnd_press will be only recovered after rwnd reach the same value as
of rwnd_press, mostly to prevent silly window syndrome.

Thing is, this is very inefficient with small data chunks, as with those
it will never reach back that value, and thus it will never recover from
such pressure. This means that we will not issue window updates when
recovering from 0 window and will rely on a sender retransmit to notice
it.

The fix here is to remove such threshold, as no value is good enough: it
depends on the (avg) chunk sizes being used.

Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
Rate limited to 845kbps, for visibility. Capture done at netserver side.

Previously:
01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
  (window update missed)
04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g

After:
02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
  (following data transmission triggered by window updates above)
03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 14:01:35 -05:00
Marcelo Ricardo Leitner
58b94d88de sctp: do not loose window information if in rwnd_over
It's possible that we receive a packet that is larger than current
window. If it's the first packet in this way, it will cause it to
increase rwnd_over. Then, if we receive another data chunk (specially as
SCTP allows you to have one data chunk in flight even during 0 window),
rwnd_over will be overwritten instead of added to.

In the long run, this could cause the window to grow bigger than its
initial size, as rwnd_over would be charged only for the last received
data chunk while the code will try open the window for all packets that
were received and had its value in rwnd_over overwritten. This, then,
can lead to the worsening of payload/buffer ratio and cause rwnd_press
to kick in more often.

The fix is to sum it too, same as is done for rwnd_press, so that if we
receive 3 chunks after closing the window, we still have to release that
same amount before re-opening it.

Log snippet from sctp_test exhibiting the issue:
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 14:01:35 -05:00
Ido Schimmel
53f800e3ba neigh: Send netevent after marking neigh as dead
neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.

This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.

Fix that by sending a netevent after marking the neighbour as dead.

Fixes: a6bf9e933d ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 12:31:18 -05:00
Dave Jones
a98f917589 ipv6: handle -EFAULT from skb_copy_bits
By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
	int fd;
	int zero = 0;
	char buf[LEN];

	memset(buf, 0, LEN);

	fd = socket(AF_INET6, SOCK_RAW, 7);

	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 12:20:39 -05:00
Willem de Bruijn
39b2dd765e inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 12:19:18 -05:00
Or Gerlitz
d9724772e6 net/sched: cls_flower: Mandate mask when matching on flags
When matching on flags, we should require the user to provide the
mask and avoid using an all-ones mask. Not doing so causes matching
on flags provided w.o mask to hit on the value being unset for all
flags, which may not what the user wanted to happen.

Fixes: faa3ffce78 ('net/sched: cls_flower: Add support for matching on flags')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 11:59:56 -05:00
Or Gerlitz
dc594ecd41 net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
The UDP dst port was provided to the helper function which sets the
IPv6 IP tunnel meta-data under a wrong param order, fix that.

Fixes: 75bfbca01e ('net/sched: act_tunnel_key: Add UDP dst port option')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 11:59:56 -05:00
Xin Long
6c5d5cfbe3 netfilter: ipt_CLUSTERIP: check duplicate config when initializing
Now when adding an ipt_CLUSTERIP rule, it only checks duplicate config in
clusterip_config_find_get(). But after that, there may be still another
thread to insert a config with the same ip, then it leaves proc_create_data
to do duplicate check.

It's more reasonable to check duplicate config by ipt_CLUSTERIP itself,
instead of checking it by proc fs duplicate file check. Before, when proc
fs allowed duplicate name files in a directory, It could even crash kernel
because of use-after-free.

This patch is to check duplicate config under the protection of clusterip
net lock when initializing a new config and correct the return err.

Note that it also moves proc file node creation after adding new config, as
proc_create_data may sleep, it couldn't be called under the clusterip_net
lock. clusterip_config_find_get returns NULL if c->pde is null to make sure
it can't be used until the proc file node creation is done.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-23 15:20:01 +01:00
Lorenzo Colitti
7d99569460 net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
Commit e2d118a1cb ("net: inet: Support UID-based routing in IP
protocols.") made ip_do_redirect call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.

Fix this by getting the network namespace from the skb instead.

Fixes: e2d118a1cb ("net: inet: Support UID-based routing in IP protocols.")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-22 11:15:10 -05:00
Eric Dumazet
0a9648f129 tcp: add a missing barrier in tcp_tasklet_func()
Madalin reported crashes happening in tcp_tasklet_func() on powerpc64

Before TSQ_QUEUED bit is cleared, we must ensure the changes done
by list_del(&tp->tsq_node); are committed to memory, otherwise
corruption might happen, as an other cpu could catch TSQ_QUEUED
clearance too soon.

We can notice that old kernels were immune to this bug, because
TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk)
section, but they could have missed a kick to write additional bytes,
when NIC interrupts for a given flow are spread to multiple cpus.

Affected TCP flows would need an incoming ACK or RTO timer to add more
packets to the pipe. So overall situation should be better now.

Fixes: b223feb9de ("tcp: tsq: add shortcut in tcp_tasklet_func()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Madalin Bucur <madalin.bucur@nxp.com>
Tested-by: Madalin Bucur <madalin.bucur@nxp.com>
Tested-by: Xing Lei <xing.lei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-21 15:30:27 -05:00
Geliang Tang
a763f78cea RDS: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:22:49 -05:00
Geliang Tang
7f7cd56c33 net_sched: sch_netem: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:22:48 -05:00
Geliang Tang
e124557d60 net_sched: sch_fq: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:22:48 -05:00
Xin Long
b8607805dd sctp: not copying duplicate addrs to the assoc's bind address list
sctp.local_addr_list is a global address list that is supposed to include
all the local addresses. sctp updates this list according to NETDEV_UP/
NETDEV_DOWN notifications.

However, if multiple NICs have the same address, the global list would
have duplicate addresses. Even if for one NIC, promote secondaries in
__inet_del_ifa can also lead to accumulating duplicate addresses.

When sctp binds address 'ANY' and creates a connection, it copies all
the addresses from global list into asoc's bind addr list, which makes
sctp pack the duplicate addresses into INIT/INIT_ACK packets.

This patch is to filter the duplicate addresses when copying the addrs
from global list in sctp_copy_local_addr_list and unpacking addr_param
from cookie in sctp_raw_to_bind_addrs to asoc's bind addr list.

Note that we can't filter the duplicate addrs when global address list
gets updated, As NETDEV_DOWN event may remove an addr that still exists
in another NIC.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:15:45 -05:00
Xin Long
165f2cf640 sctp: reduce indent level in sctp_copy_local_addr_list
This patch is to reduce indent level by using continue when the addr
is not allowed, and also drop end_copy by using break.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:15:44 -05:00
Jarno Rajahalme
87e159c59d openvswitch: Add a missing break statement.
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:07:41 -05:00
zheng li
0a28cfd51e ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.

That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 10:45:22 -05:00
Linus Torvalds
52f40e9d65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes and cleanups from David Miller:

 1) Revert bogus nla_ok() change, from Alexey Dobriyan.

 2) Various bpf validator fixes from Daniel Borkmann.

 3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
    drivers, from Dongpo Li.

 4) Several ethtool ksettings conversions from Philippe Reynes.

 5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.

 6) XDP support for virtio_net, from John Fastabend.

 7) Fix NAT handling within a vrf, from David Ahern.

 8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
  net: mv643xx_eth: fix build failure
  isdn: Constify some function parameters
  mlxsw: spectrum: Mark split ports as such
  cgroup: Fix CGROUP_BPF config
  qed: fix old-style function definition
  net: ipv6: check route protocol when deleting routes
  r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
  irda: w83977af_ir: cleanup an indent issue
  net: sfc: use new api ethtool_{get|set}_link_ksettings
  net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
  net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
  net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
  net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
  bpf: fix mark_reg_unknown_value for spilled regs on map value marking
  bpf: fix overflow in prog accounting
  bpf: dynamically allocate digest scratch buffer
  gtp: Fix initialization of Flags octet in GTPv1 header
  gtp: gtp_check_src_ms_ipv4() always return success
  net/x25: use designated initializers
  isdn: use designated initializers
  ...
2016-12-17 20:17:04 -08:00
David S. Miller
67a72a5891 Three fixes:
* avoid a WARN_ON() when trying to use WEP with AP_VLANs
  * ensure enough headroom on mesh forwarding packets
  * don't report unknown/invalid rates to userspace
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYU+BMAAoJEGt7eEactAAdtnkP/1/Rftp3Ll2m+EqJn1yNUCiQ
 K/XC5xT1khSavb/+Tdv4W8Fy40y008Ljrn5a4erB1g0YhXQXbtp43hxmHrWa9Q8q
 pDngQspUUI14hX9JBaWsvMnvXNjy5KK2DsEHxdmjEYy8+eSzqwefc0VqBFvQoEhT
 X3WWTm4rgn2dtI9QhfSKp4eyStNRjavqAtUhPI1TqCYZRyLrICQ2nQ6yrTjbEN9Z
 Wtwgo3VUTv30MuSuijklgi7mXnY+yjj35qBYMqlnKtArSrMIVRExwN9LybWOVCOx
 k9o+aDJCNX21V1l2TC5IlRksQBbjhjcMP71v/SFTYJlmph8eS4ZtrApoZyr0v5TD
 1IBfgyHDl4jcsFi1v+PtgpGhh0OV+KHeh1k2lK2YTq2UfPdiuevv1OhAp3H0ZyPj
 ZpSEKpdKqOV/R8G3g6K0YTnf/Jvd5F6aVZcrgeQGNUD8QNDavUrdcfl1sqwCMrl6
 LVZmVh8bYPKEqGzpsG8Xa//4eu6Xbgz9ny49CXwLcpVnUbhOXZLawGoYd4g9Tb0/
 j3HbM5KNNCp5U9EY9jQ22ih2xgt3ntiBkZUNPUczroqZ1U1v/rdusOb7oiF520jo
 iWQeE2+XUYHIAokVE4iMP2SS55LI1BkDr5cGtysw5PSneVzpnLSMi3qIa2nZV1jA
 Z6w0fDb5gvYVpt0viXQE
 =QHRs
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-12-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three fixes:
 * avoid a WARN_ON() when trying to use WEP with AP_VLANs
 * ensure enough headroom on mesh forwarding packets
 * don't report unknown/invalid rates to userspace
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 21:41:33 -05:00
Mantas M
c2ed1880fd net: ipv6: check route protocol when deleting routes
The protocol field is checked when deleting IPv4 routes, but ignored for
IPv6, which causes problems with routing daemons accidentally deleting
externally set routes (observed by multiple bird6 users).

This can be verified using `ip -6 route del <prefix> proto something`.

Signed-off-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 21:37:06 -05:00
Kees Cook
e999cb43d5 net/x25: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:56:57 -05:00
Kees Cook
9d1c0ca5e1 net: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:56:57 -05:00
Kees Cook
99a5e178bd ATM: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:56:57 -05:00
John Fastabend
f23bc46c30 net: xdp: add invalid buffer warning
This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:48:55 -05:00
Xin Long
08abb79542 sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null
Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
when it failed to find a transport by sctp_addrs_lookup_transport.

This patch is to fix it by moving up rcu_read_unlock right before checking
transport and also to remove the out path.

Fixes: 1cceda7849 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:43:23 -05:00
Xin Long
5cb2cd68dd sctp: sctp_epaddr_lookup_transport should be protected by rcu_read_lock
Since commit 7fda702f93 ("sctp: use new rhlist interface on sctp transport
rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.

It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
as __sctp_lookup_association is called in rx path or sctp_lookup_association
which are in the protection of rcu_read_lock() already.

But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
usage' in __rhashtable_lookup.

This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
before calling sctp_epaddr_lookup_transport.

Fixes: 7fda702f93 ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:43:23 -05:00
LABBE Corentin
616f6b4023 irda: irnet: add member name to the miscdevice declaration
Since the struct miscdevice have many members, it is dangerous to init
it without members name relying only on member order.

This patch add member name to the init declaration.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:29:31 -05:00
LABBE Corentin
33de4d1bb9 irda: irnet: Remove unused IRNET_MAJOR define
The IRNET_MAJOR define is not used, so this patch remove it.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:17:24 -05:00
LABBE Corentin
24c946cc5d irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h
This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
It is better that all minor number definitions are in the same place.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:17:24 -05:00
LABBE Corentin
e292823559 irda: irnet: Move linux/miscdevice.h include
The only use of miscdevice is irda_ppp so no need to include
linux/miscdevice.h for all irda files.
This patch move the linux/miscdevice.h include to irnet_ppp.h

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:16:31 -05:00
LABBE Corentin
078497a4d9 irda: irproc.c: Remove unneeded linux/miscdevice.h include
irproc.c does not use any miscdevice so this patch remove this
unnecessary inclusion.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:16:31 -05:00
Tom Herbert
0643ee4fd1 inet: Fix get port to handle zero port number with soreuseport set
A user may call listen with binding an explicit port with the intent
that the kernel will assign an available port to the socket. In this
case inet_csk_get_port does a port scan. For such sockets, the user may
also set soreuseport with the intent a creating more sockets for the
port that is selected. The problem is that the initial socket being
opened could inadvertently choose an existing and unreleated port
number that was already created with soreuseport.

This patch adds a boolean parameter to inet_bind_conflict that indicates
rather soreuseport is allowed for the check (in addition to
sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
the argument is set to true if an explicit port is being looked up (snum
argument is nonzero), and is false if port scan is done.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:13:19 -05:00
Tom Herbert
9af7e923fd inet: Don't go into port scan when looking for specific bind port
inet_csk_get_port is called with port number (snum argument) that may be
zero or nonzero. If it is zero, then the intent is to find an available
ephemeral port number to bind to. If snum is non-zero then the caller
is asking to allocate a specific port number. In the latter case we
never want to perform the scan in ephemeral port range. It is
conceivable that this can happen if the "goto again" in "tb_found:"
is done. This patch adds a check that snum is zero before doing
the "goto again".

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:13:19 -05:00
Paul Blakey
f93bd17b91 net/sched: cls_flower: Use masked key when calling HW offloads
Zero bits on the mask signify a "don't care" on the corresponding bits
in key. Some HWs require those bits on the key to be zero. Since these
bits are masked anyway, it's okay to provide the masked key to all
drivers.

Fixes: 5b33f48842 ('net/flower: Introduce hardware offload support')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 10:44:35 -05:00
Paul Blakey
970bfcd097 net/sched: cls_flower: Use mask for addr_type
When addr_type is set, mask should also be set.

Fixes: 66530bdf85 ('sched,cls_flower: set key address type when present')
Fixes: bc3103f1ed ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 10:44:35 -05:00
Linus Torvalds
59331c215d A varied set of changes:
- a large rework of cephx auth code to cope with CONFIG_VMAP_STACK
   (myself).  Also fixed a deadlock caused by a bogus allocation on the
   writeback path and authorize reply verification.
 
 - a fix for long stalls during fsync (Jeff Layton).  The client now
   has a way to force the MDS log flush, leading to ~100x speedups in
   some synthetic tests.
 
 - a new [no]require_active_mds mount option (Zheng Yan).  On mount, we
   will now check whether any of the MDSes are available and bail rather
   than block if none are.  This check can be avoided by specifying the
   "no" option.
 
 - a couple of MDS cap handling fixes and a few assorted patches
   throughout.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJYVByGAAoJEEp/3jgCEfOLBqkH/A7nVf7ObSDYmLuYgg1gJ8zq
 4zDDE42S4yZwayAVpn3UjbfPuez5J44lsdXitExdfiHOdIQZDa/WqAbSqQ48HCSg
 7sG6ecRWg3G5zG0psPZnB+S5wGMvsLXmj2hvzV1lt2t0lI5bDLSlNRSnElbhilD/
 8Z7+Ni2go8DMC9o49SJU32lBW7IByKl4p4flveItgwUvGkIFNd8OT3CyPBUqonQs
 lRCeImRYU8Jghb+ifnRxWSbuDf7pZAPc9kL0vibpUUT/1bH6iHsedKp37WQKqc/w
 KDSNnKiZcz0gY/hJeLqE3ymCIKO6SU+JkMQSaYNTouLO5fQsRr8/uWQXSe6S5oc=
 =ypWx
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A varied set of changes:

   - a large rework of cephx auth code to cope with CONFIG_VMAP_STACK
     (myself). Also fixed a deadlock caused by a bogus allocation on the
     writeback path and authorize reply verification.

   - a fix for long stalls during fsync (Jeff Layton). The client now
     has a way to force the MDS log flush, leading to ~100x speedups in
     some synthetic tests.

   - a new [no]require_active_mds mount option (Zheng Yan).

     On mount, we will now check whether any of the MDSes are available
     and bail rather than block if none are. This check can be avoided
     by specifying the "no" option.

   - a couple of MDS cap handling fixes and a few assorted patches
     throughout"

* tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client: (32 commits)
  libceph: remove now unused finish_request() wrapper
  libceph: always signal completion when done
  ceph: avoid creating orphan object when checking pool permission
  ceph: properly set issue_seq for cap release
  ceph: add flags parameter to send_cap_msg
  ceph: update cap message struct version to 10
  ceph: define new argument structure for send_cap_msg
  ceph: move xattr initialzation before the encoding past the ceph_mds_caps
  ceph: fix minor typo in unsafe_request_wait
  ceph: record truncate size/seq for snap data writeback
  ceph: check availability of mds cluster on mount
  ceph: fix splice read for no Fc capability case
  ceph: try getting buffer capability for readahead/fadvise
  ceph: fix scheduler warning due to nested blocking
  ceph: fix printing wrong return variable in ceph_direct_read_write()
  crush: include mapper.h in mapper.c
  rbd: silence bogus -Wmaybe-uninitialized warning
  libceph: no need to drop con->mutex for ->get_authorizer()
  libceph: drop len argument of *verify_authorizer_reply()
  libceph: verify authorize reply on connect
  ...
2016-12-16 11:23:34 -08:00
Linus Torvalds
ff0f962ca3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This update contains:

   - try to clone on copy-up

   - allow renaming a directory

   - split source into managable chunks

   - misc cleanups and fixes

  It does not contain the read-only fd data inconsistency fix, which Al
  didn't like. I'll leave that to the next year..."

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (36 commits)
  ovl: fix reStructuredText syntax errors in documentation
  ovl: fix return value of ovl_fill_super
  ovl: clean up kstat usage
  ovl: fold ovl_copy_up_truncate() into ovl_copy_up()
  ovl: create directories inside merged parent opaque
  ovl: opaque cleanup
  ovl: show redirect_dir mount option
  ovl: allow setting max size of redirect
  ovl: allow redirect_dir to default to "on"
  ovl: check for emptiness of redirect dir
  ovl: redirect on rename-dir
  ovl: lookup redirects
  ovl: consolidate lookup for underlying layers
  ovl: fix nested overlayfs mount
  ovl: check namelen
  ovl: split super.c
  ovl: use d_is_dir()
  ovl: simplify lookup
  ovl: check lower existence of rename target
  ovl: rename: simplify handling of lower/merged directory
  ...
2016-12-16 10:58:12 -08:00
Linus Torvalds
759b2656b2 The one new feature is support for a new NFSv4.2 mode_umask attribute
that makes ACL inheritance a little more useful in environments that
 default to restrictive umasks.  Requires client-side support, also on
 its way for 4.10.
 
 Other than that, miscellaneous smaller fixes and cleanup, especially to
 the server rdma code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYVAEqAAoJECebzXlCjuG+VM0QAKaR+ibSM31Ahpnrgit5/wrb
 n630KDFztO7iqEeuHfPQ4/n05T2QR0JWsLpjLMFvx88Gy4gyXYk9cuDPIrNKX1IS
 3/nnhBo0+EVnjODjufommCrtbPZlqOSsS3N03vWkB7rTi8QYsWBOThh+XLRJYOXo
 LZzJE1WmXNeCXV1kXPBsauryywql1fmwTXBzmIf1HbzoGAVROMEA2qqh4Z3nb7BP
 sJuGchWx0STBOuAa278ighXQPUW2lUft9uzw2bssOtMwfNyOs/Pd6nx4F1Lg6WwD
 1UQXoiR8K3PqelZfoeFJ05v0css/sbNKep+huWRdOXZj3Kjpa20lKBX8xHfat7sN
 1OQ4FHx8ToigX3c+wwtlCqRMCcIxqUYkRjqzPHyeBiSSSp0rLrId44rI5x/K0yay
 3bkGw7hFDSzc0Nq2uZgmtlbyTC71hLNhkWe7ThofcVG/pS0JtAqBiKIVwXJPh/e0
 PLmVHYGU6Xowjag5edJlXY1tlIlxtWfqsWUarCXS5bfKUa3UjMVSjyuljsDqqJsn
 96fEWu7DiUo4HeGYmf8MJoeZYV2y0DKSQGeguVkUKWp2DoTzinQHTfdKvrZVwNuu
 hVE9/QeWzUvPY13HOUaKD2skozhbUChqv0NHESKUv8gxE3svTEpYZkXrE74WNqMk
 l/WXAhw+RdKZof4+qdjU
 =JANY
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "The one new feature is support for a new NFSv4.2 mode_umask attribute
  that makes ACL inheritance a little more useful in environments that
  default to restrictive umasks. Requires client-side support, also on
  its way for 4.10.

  Other than that, miscellaneous smaller fixes and cleanup, especially
  to the server rdma code"

[ The client side of the umask attribute was merged yesterday ]

* tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux:
  nfsd: add support for the umask attribute
  sunrpc: use DEFINE_SPINLOCK()
  svcrdma: Further clean-up of svc_rdma_get_inv_rkey()
  svcrdma: Break up dprintk format in svc_rdma_accept()
  svcrdma: Remove unused variable in rdma_copy_tail()
  svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
  svcrdma: Remove svc_rdma_op_ctxt::wc_status
  svcrdma: Remove DMA map accounting
  svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
  svcrdma: Renovate sendto chunk list parsing
  svcauth_gss: Close connection when dropping an incoming message
  svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
  nfsd: constify reply_cache_stats_operations structure
  nfsd: update workqueue creation
  sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code
  nfsd: catch errors in decode_fattr earlier
  nfsd: clean up supported attribute handling
  nfsd: fix error handling for clients that fail to return the layout
  nfsd: more robust allocation failure handling in nfsd_reply_cache_init
2016-12-16 10:48:28 -08:00
Linus Torvalds
9a19a6db37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
2016-12-16 10:24:44 -08:00
Miklos Szeredi
beef5121f3 Revert "af_unix: fix hard linked sockets on overlay"
This reverts commit eb0a4a47ae.

Since commit 51f7e52dc9 ("ovl: share inode for hard link") there's no
need to call d_real_inode() to check two overlay inodes for equality.

Side effect of this revert is that it's no longer possible to connect one
socket on overlayfs to one on the underlying layer (something which didn't
make sense anyway).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16 11:02:53 +01:00
Linus Torvalds
73e2e0c9b1 NFS client updates for Linux 4.10
Highlights include:
 
 Stable bugfixes:
  - Fix a pnfs deadlock between read resends and layoutreturn
  - Don't invalidate the layout stateid while a layout return is outstanding
  - Don't schedule a layoutreturn if the layout stateid is marked as invalid
  - On a pNFS error, do not send LAYOUTGET until the LAYOUTRETURN is complete
  - SUNRPC: fix refcounting problems with auth_gss messages.
 
 Features:
  - Add client support for the NFSv4 umask attribute.
  - NFSv4: Correct support for flock() stateids.
  - Add a LAYOUTRETURN operation to CLOSE and DELEGRETURN when return-on-close
    is specified
  - Allow the pNFS/flexfiles layoutstat information to piggyback on LAYOUTRETURN
  - Optimise away redundant GETATTR calls when doing state recovery and/or
    when not required by cache revalidation rules or close-to-open cache
    consistency.
  - Attribute cache improvements
  - RPC/RDMA support for SG_GAP devices
 
 Bugfixes:
  - NFS: Fix performance regressions in readdir
  - pNFS/flexfiles: Fix a deadlock on LAYOUTGET
  - NFSv4: Add missing nfs_put_lock_context()
  - NFSv4.1: Fix regression in callback retry handling
  - Fix false positive NFSv4.0 trunking detection.
  - pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated
  - Various layout stateid related bugfixes
  - RPC/RDMA bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUyemAAoJEGcL54qWCgDy96wP/Ry86cknfLUqLKJCbFVV4nV8
 HdovCY8if8JQO0HUPDJ25ITvoRJNVRRwJMWnVq5XHRrPUHletDks6/UYfa63UDMv
 umHvGST1cQPU1G+vBIQ3sdkVi1X1GeyBY4rU8aDWxLyKWwyeNptCK12i80ifyaGV
 GZIIxuKVDOFS15M7NwMPRkrBacF8TyVK6S7275z6ZNmhFtvYwMAbvMxLabTwWAe8
 4A03m4RDBTYhQIc2xLJbHfOTYoHi34l90wrn3C7Wv0I2zp8EJlzCY2tSbYKhfPg7
 0HVKNdruRL+cHwLwJEcjFbxOg9MArgRxyup3dwAYQq7Ivsf9oR8/D61CDhanXAzy
 cAWyrCyxaAoPWCOb8k4OFRh6jOF9LBGb5WTNpXRi1LoGrbvi6/WLlJccV60325wd
 gmSAiwIE7aLG8pFk54J0Et86VaQ6qQNBUtJY/4m87uf1FSv3yzQvh7qDr7s+t8ZQ
 kmSTZJzMWZLEEeyvEPZCfjygFu7n4PuTePJu31217styvat39TpY2p0HaaMhgC0V
 /Y0ygGH7VlGp0oaVQ70CtBzGsCWTKU2DU8di7nvsCKg6iLv89QBILIJhVeP42tKd
 juNCWVw4bpW1Zex7HXKecKfMXkDJ4qSDLFzGWj6Ue85f/rCOSKQH01jfvwBlvtBc
 3E6fk85ExTw2+siHWiGy
 =MapM
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable bugfixes:
   - Fix a pnfs deadlock between read resends and layoutreturn
   - Don't invalidate the layout stateid while a layout return is
     outstanding
   - Don't schedule a layoutreturn if the layout stateid is marked as
     invalid
   - On a pNFS error, do not send LAYOUTGET until the LAYOUTRETURN is
     complete
   - SUNRPC: fix refcounting problems with auth_gss messages.

  Features:
   - Add client support for the NFSv4 umask attribute.
   - NFSv4: Correct support for flock() stateids.
   - Add a LAYOUTRETURN operation to CLOSE and DELEGRETURN when
     return-on-close is specified
   - Allow the pNFS/flexfiles layoutstat information to piggyback on
     LAYOUTRETURN
   - Optimise away redundant GETATTR calls when doing state recovery
     and/or when not required by cache revalidation rules or
     close-to-open cache consistency.
   - Attribute cache improvements
   - RPC/RDMA support for SG_GAP devices

  Bugfixes:
   - NFS: Fix performance regressions in readdir
   - pNFS/flexfiles: Fix a deadlock on LAYOUTGET
   - NFSv4: Add missing nfs_put_lock_context()
   - NFSv4.1: Fix regression in callback retry handling
   - Fix false positive NFSv4.0 trunking detection.
   - pNFS/flexfiles: Only send layoutstats updates for mirrors that were
     updated
   - Various layout stateid related bugfixes
   - RPC/RDMA bugfixes"

* tag 'nfs-for-4.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (82 commits)
  SUNRPC: fix refcounting problems with auth_gss messages.
  nfs: add support for the umask attribute
  pNFS/flexfiles: Ensure we have enough buffer for layoutreturn
  pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr()
  pNFS/flexfiles: Fix a deadlock on LAYOUTGET
  pNFS: Layoutreturn must free the layout after the layout-private data
  pNFS/flexfiles: Fix ff_layout_add_ds_error_locked()
  NFSv4: Add missing nfs_put_lock_context()
  pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid
  NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery()
  NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE
  NFS: Only look at the change attribute cache state in nfs_check_verifier
  NFS: Fix incorrect size revalidation when holding a delegation
  NFS: Fix incorrect mapping revalidation when holding a delegation
  pNFS/flexfiles: Support sending layoutstats in layoutreturn
  pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturn
  NFS: Fix up read of mirror stats
  pNFS/flexfiles: Clean up layoutstats
  pNFS/flexfiles: Refactor encoding of the layoutreturn payload
  pNFS: Add a layoutreturn callback to performa layout-private setup
  ...
2016-12-15 18:47:31 -08:00
Linus Torvalds
ed3c5a0be3 virtio, vhost: new device, fixes, speedups
This includes the new virtio crypto device, and fixes all over the
 place.  In particular enabling endian-ness checks for sparse builds
 found some bugs which this fixes.  And it appears that everyone is in
 agreement that disabling endian-ness sparse checks shouldn't be
 necessary any longer.
 
 So this enables them for everyone, and drops __CHECK_ENDIAN__
 and __bitwise__ APIs.
 
 IRQ handling in virtio has been refactored somewhat, the
 larger switch to IRQ_SHARED will have to wait as
 it proved too aggressive.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYUxYEAAoJECgfDbjSjVRp5lgH/22HKRyb3+M+z3oH6R9rJmz5
 T4y3XI4yDOTlh93VzxlrHjHNBnoWRvzV5hn6BKH6bTbSZ87TabNhfws11FKGvhER
 G1ipl/DvwytvvWgZ5dFdcC4x/0wpWawt2jgpEpPP33VDVkGJFEEAGj6GX10ClX99
 ggrNfzUCHOAFaIWzC29i7gYMnYHIJDUqK6ycDxZebzsE/c12SNRGASxei2D+6eYC
 YkdVg0c/d7Wsk+ZO1ugiA6omO4UdvPAVvxUkvd4YphRikwEWH7gGuz558wiSo4VN
 iEMZvyYXSEjx4B2Hg8+mH63zWROEpCmaToUix9+4AF7YhkaeX5fICNdkAPdtxc8=
 =urXH
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio, vhost: new device, fixes, speedups

  This includes the new virtio crypto device, and fixes all over the
  place. In particular enabling endian-ness checks for sparse builds
  found some bugs which this fixes. And it appears that everyone is in
  agreement that disabling endian-ness sparse checks shouldn't be
  necessary any longer.

  So this enables them for everyone, and drops the __CHECK_ENDIAN__ and
  __bitwise__ APIs.

  IRQ handling in virtio has been refactored somewhat, the larger switch
  to IRQ_SHARED will have to wait as it proved too aggressive"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  Makefile: drop -D__CHECK_ENDIAN__ from cflags
  fs/logfs: drop __CHECK_ENDIAN__
  Documentation/sparse: drop __CHECK_ENDIAN__
  linux: drop __bitwise__ everywhere
  checkpatch: replace __bitwise__ with __bitwise
  Documentation/sparse: drop __bitwise__
  tools: enable endian checks for all sparse builds
  linux/types.h: enable endian checks for all sparse builds
  virtio_mmio: Set dev.release() to avoid warning
  vhost: remove unused feature bit
  virtio_ring: fix description of virtqueue_get_buf
  vhost/scsi: Remove unused but set variable
  tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h
  vringh: kill off ACCESS_ONCE()
  tools/virtio: fix READ_ONCE()
  crypto: add virtio-crypto driver
  vhost: cache used event for better performance
  vsock: lookup and setup guest_cid inside vhost_vsock_lock
  virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants
  virtio_pci: merge vp_free_vectors into vp_del_vqs
  ...
2016-12-15 18:13:41 -08:00
Michael S. Tsirkin
6bdf1e0efb Makefile: drop -D__CHECK_ENDIAN__ from cflags
That's the default now, no need for makefiles to set it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
2016-12-16 00:13:43 +02:00
Michael S. Tsirkin
9efeccacd3 linux: drop __bitwise__ everywhere
__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Akced-by: Lee Duncan <lduncan@suse.com>
2016-12-16 00:13:41 +02:00
Linus Torvalds
4d5b57e05a Updates for 4.10 kernel merge window
- Shared mlx5 updates with net stack (will drop out on merge if Dave's
   tree has already been merged)
 - Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
 - Debug cleanups
 - New connection rejection helpers
 - SRP updates
 - Various misc fixes
 - New paravirt driver from vmware
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3
 cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx
 96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo
 dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO
 lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs
 GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr
 7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb
 flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU
 YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou
 FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n
 PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp
 4uSHJqpf3YEYylxGMhk3
 =QeGy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is the complete update for the rdma stack for this release cycle.

  Most of it is typical driver and core updates, but there is the
  entirely new VMWare pvrdma driver. You may have noticed that there
  were changes in DaveM's pull request to the bnxt Ethernet driver to
  support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
  be pulled in this release cycle, but it simply wasn't ready in time
  and was dropped (a few review comments still to address, and some
  multi-arch build issues like prefetch() not working across all
  arches).

  Summary:

   - shared mlx5 updates with net stack (will drop out on merge if
     Dave's tree has already been merged)

   - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe

   - debug cleanups

   - new connection rejection helpers

   - SRP updates

   - various misc fixes

   - new paravirt driver from vmware"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
  IB: Add vmw_pvrdma driver
  IB/mlx4: fix improper return value
  IB/ocrdma: fix bad initialization
  infiniband: nes: return value of skb_linearize should be handled
  MAINTAINERS: Update Intel RDMA RNIC driver maintainers
  MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
  IB/core: fix unmap_sg argument
  qede: fix general protection fault may occur on probe
  IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
  mlx5, calc_sq_size(): Make a debug message more informative
  mlx5: Remove a set-but-not-used variable
  mlx5: Use { } instead of { 0 } to init struct
  IB/srp: Make writing the add_target sysfs attr interruptible
  IB/srp: Make mapping failures easier to debug
  IB/srp: Make login failures easier to debug
  IB/srp: Introduce a local variable in srp_add_one()
  IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
  IB/multicast: Check ib_find_pkey() return value
  IPoIB: Avoid reading an uninitialized member variable
  IB/mad: Fix an array index check
  ...
2016-12-15 12:03:32 -08:00
Ben Greear
a17d93ff3a mac80211: fix legacy and invalid rx-rate report
This fixes obtaining the rate info via sta_set_sinfo
when the rx rate is invalid (for instance, on IBSS
interface that has received no frames from one of its
peers).

Also initialize rinfo->flags for legacy rates, to not
rely on the whole sinfo being initialized to zero.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 10:54:48 +01:00
Michael S. Tsirkin
f83f12d660 vsock/virtio: fix src/dst cid format
These fields are 64 bit, using le32_to_cpu and friends
on these will not do the right thing.
Fix this up.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:59:19 +02:00
Michael S. Tsirkin
819483d806 vsock/virtio: mark an internal function static
virtio_transport_alloc_pkt is only used locally, make it static.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:59:19 +02:00
Michael S. Tsirkin
6c7efafdd5 vsock/virtio: add a missing __le annotation
guest cid is read from config space, therefore it's in little endian
format and is treated as such, annotate it accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:59:18 +02:00
Linus Torvalds
a57cb1c1d7 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a few misc things

 - kexec updates

 - DMA-mapping updates to better support networking DMA operations

 - IPC updates

 - various MM changes to improve DAX fault handling

 - lots of radix-tree changes, mainly to the test suite. All leading up
   to reimplementing the IDA/IDR code to be a wrapper layer over the
   radix-tree. However the final trigger-pulling patch is held off for
   4.11.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  radix tree test suite: delete unused rcupdate.c
  radix tree test suite: add new tag check
  radix-tree: ensure counts are initialised
  radix tree test suite: cache recently freed objects
  radix tree test suite: add some more functionality
  idr: reduce the number of bits per level from 8 to 6
  rxrpc: abstract away knowledge of IDR internals
  tpm: use idr_find(), not idr_find_slowpath()
  idr: add ida_is_empty
  radix tree test suite: check multiorder iteration
  radix-tree: fix replacement for multiorder entries
  radix-tree: add radix_tree_split_preload()
  radix-tree: add radix_tree_split
  radix-tree: add radix_tree_join
  radix-tree: delete radix_tree_range_tag_if_tagged()
  radix-tree: delete radix_tree_locate_item()
  radix-tree: improve multiorder iterators
  btrfs: fix race in btrfs_free_dummy_fs_info()
  radix-tree: improve dump output
  radix-tree: make radix_tree_find_next_bit more useful
  ...
2016-12-14 17:25:18 -08:00
Matthew Wilcox
444306129a rxrpc: abstract away knowledge of IDR internals
Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.

Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Pablo Neira Ayuso
053d20f571 netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
If the NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag is set, then mangle layer 4
checksum. This should not depend on csum_type NFT_PAYLOAD_CSUM_INET
since IPv6 header has no checksum field, but still an update of any of
the pseudoheader fields may trigger a layer 4 checksum update.

Fixes: 1814096980 ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:39:11 +01:00
Florian Westphal
3e38df136e netfilter: nf_tables: fix oob access
BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:39:06 +01:00
Pablo Neira Ayuso
c2e756ff9e netfilter: nft_queue: use raw_smp_processor_id()
Using smp_processor_id() causes splats with PREEMPT_RCU:

[19379.552780] BUG: using smp_processor_id() in preemptible [00000000] code: ping/32389
[19379.552793] caller is debug_smp_processor_id+0x17/0x19
[...]
[19379.552823] Call Trace:
[19379.552832]  [<ffffffff81274e9e>] dump_stack+0x67/0x90
[19379.552837]  [<ffffffff8129a4d4>] check_preemption_disabled+0xe5/0xf5
[19379.552842]  [<ffffffff8129a4fb>] debug_smp_processor_id+0x17/0x19
[19379.552849]  [<ffffffffa07c42dd>] nft_queue_eval+0x35/0x20c [nft_queue]

No need to disable preemption since we only fetch the numeric value, so
let's use raw_smp_processor_id() instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:39:01 +01:00
Pablo Neira Ayuso
8010d7feb2 netfilter: nft_quota: reset quota after dump
Dumping of netlink attributes may fail due to insufficient room in the
skbuff, so let's reset consumed quota if we succeed to put netlink
attributes into the skbuff.

Fixes: 43da04a593 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:38:51 +01:00
Linus Torvalds
dcdaa2f948 Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "After the small number of patches for v4.9, we've got a much bigger
  pile for v4.10.

  The bulk of these patches involve a rework of the audit backlog queue
  to enable us to move the netlink multicasting out of the task/thread
  that generates the audit record and into the kernel thread that emits
  the record (just like we do for the audit unicast to auditd).

  While we were playing with the backlog queue(s) we fixed a number of
  other little problems with the code, and from all the testing so far
  things look to be in much better shape now. Doing this also allowed us
  to re-enable disabling IRQs for some netns operations ("netns: avoid
  disabling irq for netns id").

  The remaining patches fix some small problems that are well documented
  in the commit descriptions, as well as adding session ID filtering
  support"

* 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit:
  audit: use proper refcount locking on audit_sock
  netns: avoid disabling irq for netns id
  audit: don't ever sleep on a command record/message
  audit: handle a clean auditd shutdown with grace
  audit: wake up kauditd_thread after auditd registers
  audit: rework audit_log_start()
  audit: rework the audit queue handling
  audit: rename the queues and kauditd related functions
  audit: queue netlink multicast sends just like we do for unicast sends
  audit: fixup audit_init()
  audit: move kaudit thread start from auditd registration to kaudit init (#2)
  audit: add support for session ID user filter
  audit: fix formatting of AUDIT_CONFIG_CHANGE events
  audit: skip sessionid sentinel value when auto-incrementing
  audit: tame initialization warning len_abuf in audit_log_execve_info
  audit: less stack usage for /proc/*/loginuid
2016-12-14 14:06:40 -08:00
Ilya Dryomov
45ee2c1d66 libceph: remove now unused finish_request() wrapper
Kill the wrapper and rename __finish_request() to finish_request().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-14 22:39:08 +01:00
Ilya Dryomov
c297eb4269 libceph: always signal completion when done
r_safe_completion is currently, and has always been, signaled only if
on-disk ack was requested.  It's there for fsync and syncfs, which wait
for in-flight writes to flush - all data write requests set ONDISK.

However, the pool perm check code introduced in 4.2 sends a write
request with only ACK set.  An unfortunately timed syncfs can then hang
forever: r_safe_completion won't be signaled because only an unsafe
reply was requested.

We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
that is somewhat incomplete and yet another special case.  Instead,
rename this completion to r_done_completion and always signal it when
the OSD client is done with the request, whether unsafe, safe, or
error.  This is a bit cleaner and helps with the cancellation code.

Reported-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-14 22:39:08 +01:00
Paul Moore
fba143c66a netns: avoid disabling irq for netns id
Bring back commit bc51dddf98 ("netns: avoid disabling irq for netns
id") now that we've fixed some audit multicast issues that caused
problems with original attempt.  Additional information, and history,
can be found in the links below:

 * https://github.com/linux-audit/audit-kernel/issues/22
 * https://github.com/linux-audit/audit-kernel/issues/23

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14 13:06:04 -05:00
Steve Wise
39384f04d0 rds_rdma: log the connection reject message
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Cedric Izoard
d8da0b5d64 mac80211: Ensure enough headroom when forwarding mesh pkt
When a buffer is duplicated during MESH packet forwarding,
this patch ensures that the new buffer has enough headroom.

Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:08:37 +01:00
Johannes Berg
ec4efc4a10 mac80211: don't call drv_set_default_unicast_key() for VLANs
Since drivers know nothing about AP_VLAN interfaces, trying to
call drv_set_default_unicast_key() just results in a warning
and no call to the driver. Avoid the warning by not calling the
driver for this on AP_VLAN interfaces.

This means that drivers that somehow need this call for AP mode
will fail to work properly in the presence of VLAN interfaces,
but the current drivers don't seem to use it, and mac80211 will
select and indicate the key - so drivers should be OK now.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 15:57:59 +01:00
Linus Torvalds
e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Tobias Klauser
f6c0d1a3ed crush: include mapper.h in mapper.c
Include linux/crush/mapper.h in crush/mapper.c to get the prototypes of
crush_find_rule and crush_do_rule which are defined there. This fixes
the following GCC warnings when building with 'W=1':

  net/ceph/crush/mapper.c:40:5: warning: no previous prototype for ‘crush_find_rule’ [-Wmissing-prototypes]
  net/ceph/crush/mapper.c:793:5: warning: no previous prototype for ‘crush_do_rule’ [-Wmissing-prototypes]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[idryomov@gmail.com: corresponding !__KERNEL__ include]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12 23:54:26 +01:00
Ilya Dryomov
b3bbd3f2ab libceph: no need to drop con->mutex for ->get_authorizer()
->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and
->check_message_signature() shouldn't be doing anything with or on the
connection (like closing it or sending messages).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:21 +01:00
Ilya Dryomov
0dde584882 libceph: drop len argument of *verify_authorizer_reply()
The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply.  Nothing sensible can be passed from the
messenger layer anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:21 +01:00
Ilya Dryomov
5c056fdc5b libceph: verify authorize reply on connect
After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.

AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd1b ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").

The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down.  Pass 0 to facilitate backporting, and kill
it in the next commit.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:21 +01:00
Ilya Dryomov
5418d0a2c8 libceph: no need for GFP_NOFS in ceph_monc_init()
It's called during inital setup, when everything should be allocated
with GFP_KERNEL.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:21 +01:00
Ilya Dryomov
7af3ea189a libceph: stop allocating a new cipher on every crypto request
This is useless and more importantly not allowed on the writeback path,
because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
can recurse back into the filesystem:

    kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
     ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
     ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
     00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
     [<ffffffff951eb4e1>] ? schedule+0x31/0x80
     [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10
     [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130
     [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30
     [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
     [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270
     [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
     [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120
     [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
     [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190
     [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0
     [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320
     [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350
     [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0
     [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60
     [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560
     [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0
     [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
     [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580
     [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph]
     [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
     [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60
     [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
     [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130
     [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180
     [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0
     [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150
     [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120
     [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
     [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
     [<ffffffff950d40a0>] ? release_sock+0x40/0x90
     [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0
     [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph]
     [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph]
     [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph]
     [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph]
     [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
     [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
     [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph]
     [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph]
     [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph]
     [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
     [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410
     [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480
     [<ffffffff94c95170>] ? process_one_work+0x410/0x410
     [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0
     [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40
     [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190

Allocating the cipher along with the key fixes the issue - as long the
key doesn't change, a single cipher context can be used concurrently in
multiple requests.

We still can't take that GFP_KERNEL allocation though.  Both
ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.

Reported-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:20 +01:00
Ilya Dryomov
6db2304aab libceph: uninline ceph_crypto_key_destroy()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:20 +01:00
Ilya Dryomov
2b1e1a7cd0 libceph: remove now unused ceph_*{en,de}crypt*() functions
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:20 +01:00
Ilya Dryomov
e15fd0a11d libceph: switch ceph_x_decrypt() to ceph_crypt()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:19 +01:00
Ilya Dryomov
d03857c63b libceph: switch ceph_x_encrypt() to ceph_crypt()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:19 +01:00
Ilya Dryomov
4eb4517ce7 libceph: tweak calcu_signature() a little
- replace an ad-hoc array with a struct
- rename to calc_signature() for consistency

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:19 +01:00
Ilya Dryomov
7882a26d2e libceph: rename and align ceph_x_authorizer::reply_buf
It's going to be used as a temporary buffer for in-place en/decryption
with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12 23:09:19 +01:00