Commit graph

723985 commits

Author SHA1 Message Date
Dominik Brodowski
7a94b8c2ee nl80211: take RCU read lock when calling ieee80211_bss_get_ie()
As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both
the call to this function and any operation on this variable need
protection by the RCU read lock.

Fixes: 44905265bc ("nl80211: don't expose wdev->ssid for most interfaces")
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-15 09:15:04 +01:00
Johannes Berg
a48a52b7be cfg80211: fully initialize old channel for event
Paul reported that he got a report about undefined behaviour
that seems to me to originate in using uninitialized memory
when the channel structure here is used in the event code in
nl80211 later.

He never reported whether this fixed it, and I wasn't able
to trigger this so far, but we should do the right thing and
fully initialize the on-stack structure anyway.

Reported-by: Paul Menzel <pmenzel+linux-wireless@molgen.mpg.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-15 09:15:03 +01:00
David S. Miller
8155aedf51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-01-13

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Follow-up fix to the recent BPF out-of-bounds speculation
   fix that prevents max_entries overflows and an undefined
   behavior on 32 bit archs on index_mask calculation, from
   Daniel.

2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that
   was otherwise throwing an unknown opcode warning in the
   interpreter, from Daniel.

3) Typo fix in one of the user facing verbose() messages that
   was added during the BPF out-of-bounds speculation fix,
   from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14 11:01:33 -05:00
David S. Miller
5dd966c680 mlx5-fixes-2018-01-11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaV/8zAAoJEEg/ir3gV/o+Ui0IAJWzQe1HHNJeWUykLNYMbjbQ
 vxnnsbYd5Y2j/Q+tJdi6FJxEvC8kF78BJvDp37plV+lqZAXsvaGlLgyWcnY/XkFs
 byVjueQVyil/JUyIw5ciK8DqOOI/tdo5v51ZJ05JZYRdyP0E1KVwy9jXHEpVJKmm
 QlIfjrzbv5p6ydysKi6Z9NFKJ9CjSIyHh4Ew8VQKuzJ+AtoZ8L6XWGDHvoZl8DAb
 jweJwRU9bhBqyoWjRRGtTqfcNmwvhYcJwuzER9vKK4l4JAvzhl8PGzeWsE8wPYBW
 uKNRAPET7HQQsZblPU2l66oZVFMXhBsBnuY90jlPlmeHB/KLBLZp/m4EIzHEsoM=
 =EiuX
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2018-01-11

The following series includes fixes to mlx5 core and netdev driver.
To highlight we have two critical fixes in this series:
1st patch from Eran to address a fix for Host2BMC Breakage.

2nd patch from Saeed to address the RDMA IRQ vector affinity settings query
issue, the patch provides the correct mlx5_core implementation for RDMA to
correctly  query vector affinity.
I sent this patch privately to Sagi a week a go, so he could to test it
but I didn't hear from him.

All other patches are trivial misc fixes.
Please pull and let me know if there's any problem.

for -stable v4.14-y and later:
("net/mlx5: Fix get vector affinity helper function")
("{net,ib}/mlx5: Don't disable local loopback multicast traffic when needed")

Note: Merging this series with net-next will produce the following conflict:
<<<<<<< HEAD
        u8         disable_local_lb[0x1];
        u8         reserved_at_3e2[0x1];
        u8         log_min_hairpin_wq_data_sz[0x5];
        u8         reserved_at_3e8[0x3];
=======
        u8         disable_local_lb_uc[0x1];
        u8         disable_local_lb_mc[0x1];
        u8         reserved_at_3e3[0x8];
>>>>>>> 359c96447ac2297fabe15ef30b60f3b4b71e7fd0

To resolve, use the following hunk:
i.e:
<<<<<<
        u8         disable_local_lb_uc[0x1];
        u8         disable_local_lb_mc[0x1];
        u8         log_min_hairpin_wq_data_sz[0x5];
        u8         reserved_at_3e8[0x3];
>>>>>>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-12 10:40:48 -05:00
David S. Miller
9c70f1a7fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2018-01-11

1) Don't allow to change the encap type on state updates.
   The encap type is set on state initialization and
   should not change anymore. From Herbert Xu.

2) Skip dead policies when rehashing to fix a
   slab-out-of-bounds bug in xfrm_hash_rebuild.
   From Florian Westphal.

3) Two buffer overread fixes in pfkey.
   From Eric Biggers.

4) Fix rcu usage in xfrm_get_type_offload,
   request_module can sleep, so can't be used
   under rcu_read_lock. From Sabrina Dubroca.

5) Fix an uninitialized lock in xfrm_trans_queue.
   Use __skb_queue_tail instead of skb_queue_tail
   in xfrm_trans_queue as we don't need the lock.
   From Herbert Xu.

6) Currently it is possible to create an xfrm state with an
   unknown encap type in ESP IPv4. Fix this by returning an
   error on unknown encap types. Also from Herbert Xu.

7) Fix sleeping inside a spinlock in xfrm_policy_cache_flush.
   From Florian Westphal.

8) Fix ESP GRO when the headers not fully in the linear part
   of the skb. We need to pull before we can access them.

9) Fix a skb leak on error in key_notify_policy.

10) Fix a race in the xdst pcpu cache, we need to
    run the resolver routines with bottom halfes
    off like the old flowcache did.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-12 10:32:49 -05:00
Linus Torvalds
1545dec46d Two rbd fixes for 4.12 and 4.2 issues respectively, marked for stable.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJaV5ghAAoJEEp/3jgCEfOL/bEH/3UopVYhLpHPqIAGCQuONuvG
 4imm/19uVIsbJMCHZ/bbksPPOaswqCKws5ScfIJKzg9vI8PQaQ28BnQh/vELHlmk
 lUgdXgVPaCoM/3UlCquxmBn5IooLt0t7LdCvMO2MvF3mf+jThyfwcoc1H2xe1Yh2
 k9dPQWcfNBY7jGTBGzqyuNwfg9DZSMyBRx4JmOsqlCI/GDcA8DsLHIykA4wRQRF+
 gtSCYeUuZsfukCk4Z1ImoYyfbtK/tYH4s9UOc5pEsd0s0NUs8bn7hgQn2QKWmQt4
 HsOzNTiAl/pXSRi8BMqPyFGVdZLT1cRkeo/FW1Lkxg5NqJmv005ebJ1Jx8YHW9A=
 =JTlC
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
  stable"

* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
  rbd: set max_segments to USHRT_MAX
  rbd: reacquire lock should update lock owner client id
2018-01-11 16:57:32 -08:00
Linus Torvalds
ab2781592a GPIO fix for v4.15: fix a raw vs elaborate GPIO descriptor bug
introduced by yours truly.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaVzO3AAoJEEEQszewGV1zATUQAJKL3aYxRbm+BzszDAivdZ9d
 Pb4OEmMPyUMnYyl6o4GvCqJrWhIFCW3F5QY/4gsFt+ZYf2wsVZGvGGca3nYPFYG6
 qCp+gIhpY32GlVW9lqVKVEplYkxYyLva+qCKzV+aiMar2kDV8YApEbv59X7R97ir
 lsxujweIcYgTQYuEJI/Wr3vwqWGZL/hPVqydfI/gSKVYdqHca2hIvz5mZpYDVM/f
 34hgX5V8z0yj8z4rUmgoShAicToK8de3sXChG7uBaFDGeOVX1LMgjUdFucQzHr0U
 aPUAWYsrqVi5ZczK5ClhKFCr5y1cfYPv0I9bYsyqHsHFMJq3HMI1eRoFy25POFTi
 RnelnThRQKPydPCVIGif3/xfDZOn/IVi9BN6AOxTjnAw/DA/gIavQUDcoh8j4EFK
 BePwSiKyJKg4CT3q/beIz2Lxx7JrCzioB3Q1kNGQipIv/X5LeT7lNVEmWP3X1Dif
 0od6eeOhFwTSK+sXO0pSHuTB+QhOatLBdoOaHbRnwWqkWMjwVxr2KAFrHzxBueXg
 qBGD+GyytZjda9wxBJxWTJNYQA75xBZM1eZUvlv2Fu3hzlDvHUcSVJ8rC82qJ6k7
 /xz2rJz86ic+MJ3p3qOdNlUOJ7pEskqiUcttK0/LWXeKhgw3+oTQEG/asGfrPeZg
 GAp/6k4XgaXI8iALKsWi
 =S8Fj
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
 "Fix a raw vs elaborate GPIO descriptor bug introduced by yours truly"

* tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
2018-01-11 16:54:35 -08:00
Feras Daoud
237f258c42 net/mlx5e: Remove timestamp set from netdevice open flow
To avoid configuration override, timestamp set call will
be moved from the netdevice open flow to the init flow.
By this, a close-open procedure will not override the timestamp
configuration.
In addition, the change will rename mlx5e_timestamp_set function
to be mlx5e_timestamp_init.

Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:50 +02:00
Feras Daoud
afc98a0b46 net/mlx5: Update ptp_clock_event foreach PPS event
PPS event did not update ptp_clock_event fields, therefore,
timestamp value was not updated correctly. This fix updates the
event source and the timestamp value for each PPS event.

Fixes: 7c39afb394 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:50 +02:00
Gal Pressman
75b81ce719 net/mlx5e: Don't override netdev features field unless in error flow
Set features function sets dev->features in order to keep track of which
features were successfully changed and which weren't (in case the user
asks for more than one change in a single command).

This breaks the logic in __netdev_update_features which assumes that
dev->features is not changed on success and checks for diffs between
features and dev->features (diffs that might not exist at this point
because of the driver override).

The solution is to keep track of successful/failed feature changes and
assign them to dev->features in case of failure only.

Fixes: 0e405443e8 ("net/mlx5e: Improve set features ndo resiliency")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:49 +02:00
Tariq Toukan
4b7d4363f1 net/mlx5e: Check support before TC swap in ETS init
Should not do the following swap between TCs 0 and 1
when max num of TCs is 1:
tclass[prio=0]=1, tclass[prio=1]=0, tclass[prio=i]=i (for i>1)

Fixes: 08fb1dacdd ("net/mlx5e: Support DCBNL IEEE ETS")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:49 +02:00
Tariq Toukan
97c8c3aa48 net/mlx5e: Add error print in ETS init
ETS initialization might fail, add a print to indicate
such failures.

Fixes: 08fb1dacdd ("net/mlx5e: Support DCBNL IEEE ETS")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:48 +02:00
Gal Pressman
e556f6dd47 net/mlx5e: Keep updating ethtool statistics when the interface is down
ethtool statistics should be updated even when the interface is down
since it shows more than just netdev counters, which might change while
the logical link is down.
One useful use case, for example, is when running RoCE traffic over the
interface (while the logical link is down, but physical link is up) and
examining rx_prioX_bytes.

Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:48 +02:00
Maor Gottlieb
259bbc575c net/mlx5: Fix error handling in load one
We didn't store the result of mlx5_init_once, due to that
mlx5_load_one returned success on error.  Fix that.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Eran Ben Elisha
72f36be061 net/mlx5: Fix mlx5_get_uars_page to return error code
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Alaa Hleihel
b6908c2960 net/mlx5: Fix memory leak in bad flow of mlx5_alloc_irq_vectors
Fix a memory leak where in case that pci_alloc_irq_vectors failed,
priv->irq_info was not released.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:46 +02:00
Saeed Mahameed
05e0cc84e0 net/mlx5: Fix get vector affinity helper function
mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users.  To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.

Fixes: 231243c827 ("Revert mlx5: move affinity hints assignments to generic code")
Fixes: a435393aca ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:40 +02:00
Eran Ben Elisha
8978cc921f {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036 ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb4 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 00:52:42 +02:00
Linus Torvalds
cbd0a6a2cc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs regression fix from Al Viro/

Fix a leak in socket() introduced by commit 8e1611e235 ("make
sock_alloc_file() do sock_release() on failures").

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Fix a leak in socket(2) when we fail to allocate a file descriptor.
2018-01-10 17:55:42 -08:00
Linus Torvalds
64fce444f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) BPF speculation prevention and BPF_JIT_ALWAYS_ON, from Alexei
    Starovoitov.

 2) Revert dev_get_random_name() changes as adjust the error code
    returns seen by userspace definitely breaks stuff.

 3) Fix TX DMA map/unmap on older iwlwifi devices, from Emmanuel
    Grumbach.

 4) From wrong AF family when requesting sock diag modules, from Andrii
    Vladyka.

 5) Don't add new ipv6 routes attached to the null_entry, from Wei Wang.

 6) Some SCTP sockopt length fixes from Marcelo Ricardo Leitner.

 7) Don't leak when removing VLAN ID 0, from Cong Wang.

 8) Hey there's a potential leak in ipv6_make_skb() too, from Eric
    Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
  ipv6: sr: fix TLVs not being copied using setsockopt
  ipv6: fix possible mem leaks in ipv6_make_skb()
  mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
  mlxsw: pci: Wait after reset before accessing HW
  nfp: always unmask aux interrupts at init
  8021q: fix a memory leak for VLAN 0 device
  of_mdio: avoid MDIO bus removal when a PHY is missing
  caif_usb: use strlcpy() instead of strncpy()
  doc: clarification about setting SO_ZEROCOPY
  net: gianfar_ptp: move set_fipers() to spinlock protecting area
  sctp: make use of pre-calculated len
  sctp: add a ceiling to optlen in some sockopts
  sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events
  bpf: introduce BPF_JIT_ALWAYS_ON config
  bpf: avoid false sharing of map refcount with max_entries
  ipv6: remove null_entry before adding default route
  SolutionEngine771x: add Ether TSU resource
  SolutionEngine771x: fix Ether platform data
  docs-rst: networking: wire up msg_zerocopy
  net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
  ...
2018-01-10 17:53:18 -08:00
Al Viro
ce4bb04cae Fix a leak in socket(2) when we fail to allocate a file descriptor.
Got broken by "make sock_alloc_file() do sock_release() on failures" -
cleanup after sock_map_fd() failure got pulled all the way into
sock_alloc_file(), but it used to serve the case when sock_map_fd()
failed *before* getting to sock_alloc_file() as well, and that got
lost.  Trivial to fix, fortunately.

Fixes: 8e1611e235 (make sock_alloc_file() do sock_release() on failures)
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-10 18:47:05 -05:00
Daniel Borkmann
bbeb6e4323 bpf, array: fix overflow in max_entries and undefined behavior in index_mask
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-10 14:46:39 -08:00
Daniel Borkmann
7891a87efc bpf: arsh is not supported in 32 bit alu thus reject it
The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:

  0: (18) r0 = 0x0
  2: (7b) *(u64 *)(r10 -16) = r0
  3: (cc) (u32) r0 s>>= (u32) r0
  4: (95) exit

Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-10 14:42:22 -08:00
Colin Ian King
4095034393 bpf: fix spelling mistake: "obusing" -> "abusing"
Trivial fix to spelling mistake in error message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-10 14:32:59 -08:00
Mathieu Xhonneux
ccc12b11c5 ipv6: sr: fix TLVs not being copied using setsockopt
Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
to a socket through setsockopt, but the current implementation doesn't
copy possible TLVs at the end of the SRH received from userspace.

Therefore, the execution of the following branch if (sr_has_hmac(sr_phdr))
{ ... } will never complete since the len and type fields of a possible
HMAC TLV are not copied, hence seg6_get_tlv_hmac will return an error,
and the HMAC will not be computed.

This commit adds a memcpy in case TLVs have been appended to the SRH.

Fixes: a149e7c7ce ("ipv6: sr: add support for SRH injection through setsockopt")
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:03:55 -05:00
Eric Dumazet
862c03ee1d ipv6: fix possible mem leaks in ipv6_make_skb()
ip6_setup_cork() might return an error, while memory allocations have
been done and must be rolled back.

Fixes: 6422398c2a ("ipv6: introduce ipv6_make_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Reported-by: Mike Maloney <maloney@google.com>
Acked-by:  Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:01:25 -05:00
David S. Miller
8f3d194600 Merge branch 'mlxsw-couple-of-fixes'
Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of small fixes for mlxsw driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:23 -05:00
Jiri Pirko
db84924c4f mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.

Fixes: 96f17e0776 ("mlxsw: spectrum: Support RED qdisc offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:23 -05:00
Yuval Mintz
8e033a93b3 mlxsw: pci: Wait after reset before accessing HW
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.

Wait for 100ms before starting the polling.

Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:22 -05:00
Jakub Kicinski
fc2336505f nfp: always unmask aux interrupts at init
The link state and exception interrupts may be masked when we probe.
The firmware should in theory prevent sending (and automasking) those
interrupts if the device is disabled, but if my reading of the FW code
is correct there are firmwares out there with race conditions in this
area.  The interrupt may also be masked if previous driver which used
the device was malfunctioning and we didn't load the FW (there is no
other good way to comprehensively reset the PF).

Note that FW unmasks the data interrupts by itself when vNIC is
enabled, such helpful operation is not performed for LSC/EXN interrupts.

Always unmask the auxiliary interrupts after request_irq().  On the
remove path add missing PCI write flush before free_irq().

Fixes: 4c3523623d ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:50:04 -05:00
Cong Wang
78bbb15f22 8021q: fix a memory leak for VLAN 0 device
A vlan device with vid 0 is allow to creat by not able to be fully
cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.

Also, VLAN 0 is probably not a valid number and it is kinda
"reserved" for HW accelerating devices, but it is probably too
late to reject it from creation even if makes sense. Instead,
just remove the check in unregister_vlan_dev().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: ad1afb0039 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:31:07 -05:00
David S. Miller
6ade262b77 wireless-drivers fixes for 4.15
Hopefully the last set of fixes for 4.15.
 
 iwlwifi
 
 * fix DMA mapping regression since v4.14
 
 wcn36xx
 
 * fix dynamic power save which has been broken since the driver was commited
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaVLeaAAoJEG4XJFUm622bYBUH/2NhQwUJbKIrbxYhYpo0d++8
 GK3TjxpzTByCe0nXGUT5iuDaY72i9C2UoLhFQ5smVg04pE0IXQQjZus6vSGx4biz
 GUave/SkzL0EruUXjXLBYWiYDND4iynk82gWX2/Lh7qGoT2SQfD5cKz0cMdE5NrW
 E7Q1CMiaoB0i9jcksaU2uWA0XPwISxl61kU2dXuKHQOJ1CW1goI/YIHsBajshHmi
 ZEfMqxZFE+jz2Kkp4tKhvG/Xva0ylJv8bwK8CMK6MqA8oa3xdgOhv67E9mm7IGpD
 ETrRLJNnWGJlnod5u7QOZWcS01gAgT5whCqDl/lTEty0823kvjMQdwckQaIU+AE=
 =me16
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2018-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.15

Hopefully the last set of fixes for 4.15.

iwlwifi

* fix DMA mapping regression since v4.14

wcn36xx

* fix dynamic power save which has been broken since the driver was commited
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:08:46 -05:00
Madalin Bucur
95f566de02 of_mdio: avoid MDIO bus removal when a PHY is missing
If one of the child devices is missing the of_mdiobus_register_phy()
call will return -ENODEV. When a missing device is encountered the
registration of the remaining PHYs is stopped and the MDIO bus will
fail to register. Propagate all errors except ENODEV to avoid it.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:07:47 -05:00
Xiongfeng Wang
b0d55b5bc7 caif_usb: use strlcpy() instead of strncpy()
gcc-8 reports

net/caif/caif_usb.c: In function 'cfusbl_device_notify':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]

The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.

Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:06:14 -05:00
Kornilios Kourtis
af60d61fa8 doc: clarification about setting SO_ZEROCOPY
Signed-off-by: Kornilios Kourtis <kou@zurich.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:01:49 -05:00
Yangbo Lu
11d827a993 net: gianfar_ptp: move set_fipers() to spinlock protecting area
set_fipers() calling should be protected by spinlock in
case that any interrupt breaks related registers setting
and the function we expect. This patch is to move set_fipers()
to spinlock protecting area in ptp_gianfar_adjtime().

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:54:13 -05:00
David S. Miller
e5143f863c Merge branch 'sctp-Some-sockopt-optlen-fixes'
Marcelo Ricardo Leitner says:

====================
sctp: Some sockopt optlen fixes

Hangbin Liu reported that some SCTP sockopt are allowing the user to get
the kernel to allocate really large buffers by not having a ceiling on
optlen.

This patchset address this issue (in patch 2), replace an GFP_ATOMIC
that isn't needed and avoid calculating the option size multiple times
in some setsockopt.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:53:23 -05:00
Marcelo Ricardo Leitner
c76f97c99a sctp: make use of pre-calculated len
Some sockopt handling functions were calculating the length of the
buffer to be written to userspace and then calculating it again when
actually writing the buffer, which could lead to some write not using
an up-to-date length.

This patch updates such places to just make use of the len variable.

Also, replace some sizeof(type) to sizeof(var).

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:53:22 -05:00
Marcelo Ricardo Leitner
5960cefab9 sctp: add a ceiling to optlen in some sockopts
Hangbin Liu reported that some sockopt calls could cause the kernel to log
a warning on memory allocation failure if the user supplied a large optlen
value. That is because some of them called memdup_user() without a ceiling
on optlen, allowing it to try to allocate really large buffers.

This patch adds a ceiling by limiting optlen to the maximum allowed that
would still make sense for these sockopt.

Reported-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:53:22 -05:00
Marcelo Ricardo Leitner
2e83acb970 sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events
So replace it with GFP_USER and also add __GFP_NOWARN.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:53:22 -05:00
Linus Torvalds
5f615b97cd sound fixes for 4.15-rc8
A collection of the last-minute small PCM fixes:
 - A workaround for the recent regression wrt PulseAudio
 - Removal of spurious WARN_ON() that is triggered by syzkaller
 - Fixes for aloop, hardening racy accesses
 - Fixes in PCM OSS emulation wrt the unabortable loops that may cause
   RCU stall
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlpUeWoOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8rMxAAy+XJNWigvkWHd79ttKeAmndia/u9d+T6Ge/I
 VI/SJSy8vhnGO0YNf/AHEs6vtad73XnXP76x1H3TkCsrDxykfhKogCvp0Aat/Ji7
 LQFkhQKsaEdACm2TlPxmxpO64sYB8UjvcZBFS82tCmNCldMkwi8T+DDDHocP0A0D
 pOQogjffqPBZdk7X1hJxoVKOm95GI1ms09+JPrLl47aa6mLIvNxa81RGnrVK5blE
 +kYZQAblweGN8RsMVWqyrnxgRatF59UbV6JIKui/8KD2AXl3Hya/Dn2aFWtMqqH8
 p8siLsUI+tACPucNk7tMt9UjHEy7yGK02hClhYVZG6vZ81nSoJsJFTdwXBMKjrfy
 Fa1bBb8quM6WfBEHXB7YISulUrrc2nftkPhB/zIa5E9arkHWY4FL7jhdUTEAjkgr
 D0Ka3Q/PtdXxmK+NBqUpoiqDHoOQeA5HG+njsz5L0xSbxoxMy8guyxSaoeF4BOnW
 KbrVbzcJzSxDWPYGbKmeLEYHW8P3FOKNv9SI/WZErmyjQkeMiq7AuP93yYACFEyj
 LhSAxBZ00sStl6IgM4Unw6p4Gi0SOawQfADDG4Arfr/fRA52l9wmpaUwU3uJ1RMn
 gLvLfJkBbs/MwBjD5BPxfdjKIuREvMdUBwl/hZk1zp5d2ay0lAl6toNcee//MuHf
 DKd1t3I=
 =Fz+p
 -----END PGP SIGNATURE-----

Merge tag 'sound-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of the last-minute small PCM fixes:

   - A workaround for the recent regression wrt PulseAudio

   - Removal of spurious WARN_ON() that is triggered by syzkaller

   - Fixes for aloop, hardening racy accesses

   - Fixes in PCM OSS emulation wrt the unabortable loops that may cause
     RCU stall"

* tag 'sound-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
  ALSA: pcm: Abort properly at pending signal in OSS read/write loops
  ALSA: aloop: Fix racy hw constraints adjustment
  ALSA: aloop: Fix inconsistent format due to incomplete rule
  ALSA: aloop: Release cable upon open error path
  ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error
  ALSA: pcm: Add missing error checks in OSS emulation plugin builder
  ALSA: pcm: Remove incorrect snd_BUG_ON() usages
2018-01-10 11:18:31 -08:00
David S. Miller
661e4e33a9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-01-09

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Prevent out-of-bounds speculation in BPF maps by masking the
   index after bounds checks in order to fix spectre v1, and
   add an option BPF_JIT_ALWAYS_ON into Kconfig that allows for
   removing the BPF interpreter from the kernel in favor of
   JIT-only mode to make spectre v2 harder, from Alexei.

2) Remove false sharing of map refcount with max_entries which
   was used in spectre v1, from Daniel.

3) Add a missing NULL psock check in sockmap in order to fix
   a race, from John.

4) Fix test_align BPF selftest case since a recent change in
   verifier rejects the bit-wise arithmetic on pointers
   earlier but test_align update was missing, from Alexei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 11:17:21 -05:00
Geert Uytterhoeven
1e77fc8211 gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Since commit f11a04464a ("i2c: gpio: Enable working over slow
can_sleep GPIOs"), probing the i2c RTC connected to an i2c-gpio bus on
r8a7740/armadillo fails with:

    rtc-s35390a 0-0030: error resetting chip
    rtc-s35390a: probe of 0-0030 failed with error -5

More debug code reveals:

    i2c i2c-0: master_xfer[0] R, addr=0x30, len=1
    i2c i2c-0: NAK from device addr 0x30 msg #0
    s35390a_get_reg: ret = -6

Commit 02e479808b ("gpio: Alter semantics of *raw* operations to
actually be raw") moved open drain/source handling from
gpiod_set_raw_value_commit() to gpiod_set_value(), but forgot to take
into account that gpiod_set_value_cansleep() also needs this handling.
The i2c protocol mandates that i2c signals are open drain, hence i2c
communication fails.

Fix this by adding the missing handling to gpiod_set_value_cansleep(),
using a new common helper gpiod_set_value_nocheck().

Fixes: 02e479808b ("gpio: Alter semantics of *raw* operations to actually be raw")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[removed underscore syntax, added kerneldoc]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-01-10 14:17:17 +01:00
Steffen Klassert
76a4201191 xfrm: Fix a race in the xdst pcpu cache.
We need to run xfrm_resolve_and_create_bundle() with
bottom halves off. Otherwise we may reuse an already
released dst_enty when the xfrm lookup functions are
called from process context.

Fixes: c30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache")
Reported-by: Darius Ski <darius.ski@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-10 12:14:28 +01:00
Steffen Klassert
1e532d2b49 af_key: Fix memory leak in key_notify_policy.
We leak the allocated out_skb in case
pfkey_xfrm_policy2msg() fails. Fix this
by freeing it on error.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-10 09:45:11 +01:00
Linus Torvalds
cf1fb15823 RISC-V changes for 4.15-rc8
This contains what I hope are the last RISC-V changes to go into 4.15.
 I know it's a bit last minute, but I think they're all fairly small
 changes:
 
 * SR_* constants have been renamed to match the latest ISA
   specification.
 * Some CONFIG_MMU #ifdef cruft has been removed.  We've never supported
   !CONFIG_MMU.
 * __NR_riscv_flush_icache is now visible to userspace.  We were hoping
   to avoid making this public in order to force userspace to call the
   vDSO entry, but it looks like QEMU's user-mode emulation doesn't want
   to emulate a vDSO.  In order to allow glibc to fall back to a system
   call when the vDSO entry doesn't exist we're just
 * Our defconfig is no long empty.  This is another one that just slipped
   through the cracks.  The defconfig isn't perfect, but it's at least
   close to what users will want for the first RISC-V development board.
   Getting closer is kind of splitting hairs here: none of the RISC-V
   specific drivers are in yet, so it's not like things will boot out of
   the box.
 
 The only one that's strictly necessary is the __NR_riscv_flush_icache
 change, as I want that to be part of the public API starting from our
 first kernel so nobody has to worry about it.  The others are nice to
 haves, but they seem sane for 4.15 to me.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlpU/UcTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQdy5D/9fcTwXTk98U2gSoR4Dv25tztqbNMhw
 +Lae5EeIqAaPI4xfyLGldJe0BWAJaouZWIY5xkB5JWzsdYPx/jYgC+SbwI/3aGVy
 VjcU0d4haZtz2kdm0Y0ZKIGg91vDlULoVvcxrM8Jff0gDmyKoT1OjwKpt3esyhmN
 Vc+iC0FxtJow/xIaFlnPa42qh/pFkcLDmmY/Im6N8IEcyHBT6vCDnD3CgCFY/hdu
 9vcWJDvFBj4SFwL8y+ajspQ4tPzDt4Ko+3NLxtEv+19y3NEgLm+shbxv/J8AVO8O
 BvBr51QfggM2rAqGzCa4nEZZR7Roxgg9bJVQARXyzX1tUhtBEz9+eUArJ0tzMtbx
 GyXYY5NwyupDJ/MA9yn+GqYlLNnS2yL2y0zIBJehi/37+KpAFtH/cRnA58sXViqw
 IKGhKW7JCGU3/xyW+RtuY3N5urU18+qE4CZRLtI5QN0QRcTWLhqqQvQRud86HqqD
 g4KPo6g9Z6Ak9Xu81n/liIExp3Vp2kpQUts1lCF1D+4WYRwpb4Mqy4HiOCSf/OO2
 wOuX5HY+tbS8yvupgYjszTXaYDn35RoGkcjK9o1Lkq9RgI5kzHDyaQrSK/c/oAzn
 A7cJ2z7dBaV0W4O7R+2SJ2k9DHw1db/WVf19pKVjSi5osSoUds5w1YxHK25cSBUz
 +47LVCgkQI/Scw==
 =PhUK
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains what I hope are the last RISC-V changes to go into 4.15.
  I know it's a bit last minute, but I think they're all fairly small
  changes:

   - SR_* constants have been renamed to match the latest ISA
     specification.

   - Some CONFIG_MMU #ifdef cruft has been removed. We've never
     supported !CONFIG_MMU.

   - __NR_riscv_flush_icache is now visible to userspace. We were hoping
     to avoid making this public in order to force userspace to call the
     vDSO entry, but it looks like QEMU's user-mode emulation doesn't
     want to emulate a vDSO. In order to allow glibc to fall back to a
     system call when the vDSO entry doesn't exist we're just

   - Our defconfig is no long empty. This is another one that just
     slipped through the cracks. The defconfig isn't perfect, but it's
     at least close to what users will want for the first RISC-V
     development board. Getting closer is kind of splitting hairs here:
     none of the RISC-V specific drivers are in yet, so it's not like
     things will boot out of the box.

  The only one that's strictly necessary is the __NR_riscv_flush_icache
  change, as I want that to be part of the public API starting from our
  first kernel so nobody has to worry about it. The others are nice to
  haves, but they seem sane for 4.15 to me"

* tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
  riscv: rename SR_* constants to match the spec
  riscv: remove CONFIG_MMU ifdefs
  RISC-V: Make __NR_riscv_flush_icache visible to userspace
  RISC-V: Add a basic defconfig
2018-01-09 15:45:06 -08:00
Linus Torvalds
44cae9b209 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "Another round of MIPS fixes for 4.15.

   - Maciej Rozycki found another series of FP issues which requires a
     seven part series to restructure and fix.

   - James fixes a warning about .set mt which gas doesn't like when
     building for R1 processors"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
  MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
  MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
  MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
  MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
  MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
  MIPS: Factor out NT_PRFPREG regset access helpers
  MIPS: CPS: Fix r1 .set mt assembler warning
2018-01-09 15:43:13 -08:00
Alexei Starovoitov
290af86629 bpf: introduce BPF_JIT_ALWAYS_ON config
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-09 22:25:26 +01:00
Linus Torvalds
d476c5334f Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A set of fixes that should go into this release. This contains:

   - An NVMe pull request from Christoph, with a few critical fixes for
     NVMe.

   - A block drain queue fix from Ming.

   - The concurrent lo_open/release fix for loop"

* 'for-linus' of git://git.kernel.dk/linux-block:
  loop: fix concurrent lo_open/lo_release
  block: drain queue before waiting for q_usage_counter becoming zero
  nvme-fcloop: avoid possible uninitialized variable warning
  nvme-mpath: fix last path removal during traffic
  nvme-rdma: fix concurrent reset and reconnect
  nvme: fix sector units when going between formats
  nvme-pci: move use_sgl initialization to nvme_init_iod()
2018-01-09 11:20:55 -08:00
Daniel Borkmann
be95a845cc bpf: avoid false sharing of map refcount with max_entries
In addition to commit b2157399cc ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-09 10:07:30 -08:00