Commit graph

662481 commits

Author SHA1 Message Date
Daniel Borkmann
79adffcd64 bpf, verifier: fix rejection of unaligned access checks for map_value_adj
Currently, the verifier doesn't reject unaligned access for map_value_adj
register types. Commit 484611357c ("bpf: allow access into map value
arrays") added logic to check_ptr_alignment() extending it from PTR_TO_PACKET
to also PTR_TO_MAP_VALUE_ADJ, but for PTR_TO_MAP_VALUE_ADJ no enforcement
is in place, because reg->id for PTR_TO_MAP_VALUE_ADJ reg types is never
non-zero, meaning, we can cause BPF_H/_W/_DW-based unaligned access for
architectures not supporting efficient unaligned access, and thus worst
case could raise exceptions on some archs that are unable to correct the
unaligned access or perform a different memory access to the actual
requested one and such.

i) Unaligned load with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
   on r0 (map_value_adj):

   0: (bf) r2 = r10
   1: (07) r2 += -8
   2: (7a) *(u64 *)(r2 +0) = 0
   3: (18) r1 = 0x42533a00
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+11
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
   7: (61) r1 = *(u32 *)(r0 +0)
   8: (35) if r1 >= 0xb goto pc+9
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=10 R10=fp
   9: (07) r0 += 3
  10: (79) r7 = *(u64 *)(r0 +0)
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R10=fp
  11: (79) r7 = *(u64 *)(r0 +2)
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R7=inv R10=fp
  [...]

ii) Unaligned store with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
    on r0 (map_value_adj):

   0: (bf) r2 = r10
   1: (07) r2 += -8
   2: (7a) *(u64 *)(r2 +0) = 0
   3: (18) r1 = 0x4df16a00
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+19
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
   7: (07) r0 += 3
   8: (7a) *(u64 *)(r0 +0) = 42
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
   9: (7a) *(u64 *)(r0 +2) = 43
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
  10: (7a) *(u64 *)(r0 -2) = 44
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
  [...]

For the PTR_TO_PACKET type, reg->id is initially zero when skb->data
was fetched, it later receives a reg->id from env->id_gen generator
once another register with UNKNOWN_VALUE type was added to it via
check_packet_ptr_add(). The purpose of this reg->id is twofold: i) it
is used in find_good_pkt_pointers() for setting the allowed access
range for regs with PTR_TO_PACKET of same id once verifier matched
on data/data_end tests, and ii) for check_ptr_alignment() to determine
that when not having efficient unaligned access and register with
UNKNOWN_VALUE was added to PTR_TO_PACKET, that we're only allowed
to access the content bytewise due to unknown unalignment. reg->id
was never intended for PTR_TO_MAP_VALUE{,_ADJ} types and thus is
always zero, the only marking is in PTR_TO_MAP_VALUE_OR_NULL that
was added after 484611357c via 57a09bf0a4 ("bpf: Detect identical
PTR_TO_MAP_VALUE_OR_NULL registers"). Above tests will fail for
non-root environment due to prohibited pointer arithmetic.

The fix splits register-type specific checks into their own helper
instead of keeping them combined, so we don't run into a similar
issue in future once we extend check_ptr_alignment() further and
forget to add reg->type checks for some of the checks.

Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:36:37 -07:00
Daniel Borkmann
fce366a9dd bpf, verifier: fix alu ops against map_value{, _adj} register types
While looking into map_value_adj, I noticed that alu operations
directly on the map_value() resp. map_value_adj() register (any
alu operation on a map_value() register will turn it into a
map_value_adj() typed register) are not sufficiently protected
against some of the operations. Two non-exhaustive examples are
provided that the verifier needs to reject:

 i) BPF_AND on r0 (map_value_adj):

  0: (bf) r2 = r10
  1: (07) r2 += -8
  2: (7a) *(u64 *)(r2 +0) = 0
  3: (18) r1 = 0xbf842a00
  5: (85) call bpf_map_lookup_elem#1
  6: (15) if r0 == 0x0 goto pc+2
   R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
  7: (57) r0 &= 8
  8: (7a) *(u64 *)(r0 +0) = 22
   R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
  9: (95) exit

  from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
  9: (95) exit
  processed 10 insns

ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):

  0: (bf) r2 = r10
  1: (07) r2 += -8
  2: (7a) *(u64 *)(r2 +0) = 0
  3: (18) r1 = 0xc24eee00
  5: (85) call bpf_map_lookup_elem#1
  6: (15) if r0 == 0x0 goto pc+2
   R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
  7: (04) (u32) r0 += (u32) 0
  8: (7a) *(u64 *)(r0 +0) = 22
   R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
  9: (95) exit

  from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
  9: (95) exit
  processed 10 insns

Issue is, while min_value / max_value boundaries for the access
are adjusted appropriately, we change the pointer value in a way
that cannot be sufficiently tracked anymore from its origin.
Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
all the test cases coming with 484611357c ("bpf: allow access
into map value arrays") perform BPF_ADD only on the destination
register that is PTR_TO_MAP_VALUE_ADJ.

Only for UNKNOWN_VALUE register types such operations make sense,
f.e. with unknown memory content fetched initially from a constant
offset from the map value memory into a register. That register is
then later tested against lower / upper bounds, so that the verifier
can then do the tracking of min_value / max_value, and properly
check once that UNKNOWN_VALUE register is added to the destination
register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
original use-case is solving. Note, tracking on what is being
added is done through adjust_reg_min_max_vals() and later access
to the map value enforced with these boundaries and the given offset
from the insn through check_map_access_adj().

Tests will fail for non-root environment due to prohibited pointer
arithmetic, in particular in check_alu_op(), we bail out on the
is_pointer_value() check on the dst_reg (which is false in root
case as we allow for pointer arithmetic via env->allow_ptr_leaks).

Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
mode BPF_ADD. The test_verifier suite runs fine after the patch
and it also rejects mentioned test cases.

Fixes: 484611357c ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:36:37 -07:00
René Rebe
d5b07ccc1b r8152: The Microsoft Surface docks also use R8152 v2
Without this the generic cdc_ether grabs the device,
and does not really work.

Signed-off-by: René Rebe <rene@exactcode.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:19:31 -07:00
Yi-Hung Wei
6f56f6186c openvswitch: Fix ovs_flow_key_update()
ovs_flow_key_update() is called when the flow key is invalid, and it is
used to update and revalidate the flow key. Commit 329f45bc4f
("openvswitch: add mac_proto field to the flow key") introduces mac_proto
field to flow key and use it to determine whether the flow key is valid.
However, the commit does not update the code path in ovs_flow_key_update()
to revalidate the flow key which may cause BUG_ON() on execute_recirc().
This patch addresses the aforementioned issue.

Fixes: 329f45bc4f ("openvswitch: add mac_proto field to the flow key")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:16:46 -07:00
Mark Brown
3af887c38f net/faraday: Explicitly include linux/of.h and linux/property.h
This driver uses interfaces from linux/of.h and linux/property.h but
relies on implict inclusion of those headers which means that changes in
other headers could break the build, as happened in -next for arm today.
Add a explicit includes.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:14:00 -07:00
Daode Huang
b917078c1c net: hns: Add ACPI support to check SFP present
The current code only supports DT to check SFP present.
This patch adds ACPI support as well.

Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-01 12:10:58 -07:00
Suresh Reddy
0b98ca2a45 be2net: Fix endian issue in logical link config command
Use cpu_to_le32() for link_config variable in set_logical_link_config
command as this variable is of type u32.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 15:57:33 -07:00
Xin Long
3dbcc105d5 sctp: alloc stream info when initializing asoc
When sending a msg without asoc established, sctp will send INIT packet
first and then enqueue chunks.

Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing
chunks needs to access stream info, like out stream state and out stream
cnt.

This patch is to fix it by allocing out stream info when initializing an
asoc, allocing in stream and re-allocing out stream when processing init.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 11:08:47 -07:00
Andrey Konovalov
bcc5364bdc net/packet: fix overflow in check for tp_reserve
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.

Fix by checking that tp_reserve <= INT_MAX on assign.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 11:04:00 -07:00
Andrey Konovalov
8f8d28e4d6 net/packet: fix overflow in check for tp_frame_nr
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.

Add a check that tp_block_size * tp_block_nr <= UINT_MAX.

Since frames_per_block <= tp_block_size, the expression would
never overflow.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 11:03:59 -07:00
Andrey Konovalov
2b6867c2ce net/packet: fix overflow in check for priv area size
Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).

Compare them as is instead.

Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-30 11:03:59 -07:00
David S. Miller
8f1f7eeb22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a rather large update with Netfilter
fixes, specifically targeted to incorrect RCU usage in several spots and
the userspace conntrack helper infrastructure (nfnetlink_cthelper),
more specifically they are:

1) expect_class_max is incorrect set via cthelper, as in kernel semantics
   mandate that this represents the array of expectation classes minus 1.
   Patch from Liping Zhang.

2) Expectation policy updates via cthelper are currently broken for several
   reasons: This code allows illegal changes in the policy such as changing
   the number of expeciation classes, it is leaking the updated policy and
   such update occurs with no RCU protection at all. Fix this by adding a
   new nfnl_cthelper_update_policy() that describes what is really legal on
   the update path.

3) Fix several memory leaks in cthelper, from Jeffy Chen.

4) synchronize_rcu() is missing in the removal path of several modules,
   this may lead to races since CPU may still be running on code that has
   just gone. Also from Liping Zhang.

5) Don't use the helper hashtable from cthelper, it is not safe to walk
   over those bits without the helper mutex. Fix this by introducing a
   new independent list for userspace helpers. From Liping Zhang.

6) nf_ct_extend_unregister() needs synchronize_rcu() to make sure no
   packets are walking on any conntrack extension that is gone after
   module removal, again from Liping.

7) nf_nat_snmp may crash if we fail to unregister the helper due to
   accidental leftover code, from Gao Feng.

8) Fix leak in nfnetlink_queue with secctx support, from Liping Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 14:35:25 -07:00
Zakharov Vlad
358e78b5f4 ezchip: nps_enet: check if napi has been completed
After a new NAPI_STATE_MISSED state was added to NAPI we can get into
this state and in such case we have to reschedule NAPI as some work is
still pending and we have to process it. napi_complete_done() function
returns false if we have to reschedule something (e.g. in case we were
in MISSED state) as current polling have not been completed yet.

nps_enet driver hasn't been verifying the return value of
napi_complete_done() and has been forcibly enabling interrupts. That is
not correct as we should not enable interrupts before we have processed
all scheduled work. As a result we were getting trapped in interrupt
hanlder chain as we had never been able to disabale ethernet
interrupts again.

So this patch makes nps_enet_poll() func verify return value of
napi_complete_done() and enable interrupts only in case all scheduled
work has been completed.

Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 14:28:16 -07:00
David S. Miller
a1801cc83f Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Small misc. fixes.

Fix a NULL pointer crash in open failure path, wrong arguments when
printing error messages, and a DMA unmap bug in XDP shutdown path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 14:05:34 -07:00
Michael Chan
3ed3a83e3f bnxt_en: Fix DMA unmapping of the RX buffers in XDP mode during shutdown.
In bnxt_free_rx_skbs(), which is called to free up all RX buffers during
shutdown, we need to unmap the page if we are running in XDP mode.

Fixes: c61fb99cae ("bnxt_en: Add RX page mode support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 14:05:34 -07:00
Sankar Patchineelam
23e12c8934 bnxt_en: Correct the order of arguments to netdev_err() in bnxt_set_tpa()
Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 14:05:33 -07:00
Sankar Patchineelam
2247925f09 bnxt_en: Fix NULL pointer dereference in reopen failure path
Net device reset can fail when the h/w or f/w is in a bad state.
Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc().
The cleanup invokes bnxt_hwrm_resource_free() which inturn
calls bnxt_disable_int().  In this routine, the code segment

if (ring->fw_ring_id != INVALID_HW_RING_ID)
   BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);

results in NULL pointer dereference as cpr->cp_doorbell is not yet
initialized, and fw_ring_id is zero.

The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before
bnxt_init_chip() is invoked.

Signed-off-by: Sankar Patchineelam <sankar.patchineelam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 14:05:33 -07:00
Guillaume Nault
e91793bb61 l2tp: purge socket queues in the .destruct() callback
The Rx path may grab the socket right before pppol2tp_release(), but
nothing guarantees that it will enqueue packets before
skb_queue_purge(). Therefore, the socket can be destroyed without its
queues fully purged.

Fix this by purging queues in pppol2tp_session_destruct() where we're
guaranteed nothing is still referencing the socket.

Fixes: 9e9cb6221a ("l2tp: fix userspace reception on plain L2TP sockets")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 09:26:28 -07:00
Guillaume Nault
94d7ee0baa l2tp: hold tunnel socket when handling control frames in l2tp_ip and l2tp_ip6
The code following l2tp_tunnel_find() expects that a new reference is
held on sk. Either sk_receive_skb() or the discard_put error path will
drop a reference from the tunnel's socket.

This issue exists in both l2tp_ip and l2tp_ip6.

Fixes: a3c18422a4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 09:26:28 -07:00
Liping Zhang
77c1c03c5b netfilter: nfnetlink_queue: fix secctx memory leak
We must call security_release_secctx to free the memory returned by
security_secid_to_secctx, otherwise memory may be leaked forever.

Fixes: ef493bd930 ("netfilter: nfnetlink_queue: add security context information")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-29 12:20:50 +02:00
Jarno Rajahalme
b768b16de5 openvswitch: Fix refcount leak on force commit.
The reference count held for skb needs to be released when the skb's
nfct pointer is cleared regardless of if nf_ct_delete() is called or
not.

Failing to release the skb's reference cound led to deferred conntrack
cleanup spinning forever within nf_conntrack_cleanup_net_list() when
cleaning up a network namespace:

   kworker/u16:0-19025 [004] 45981067.173642: sched_switch: kworker/u16:0:19025 [120] R ==> rcu_preempt:7 [120]
   kworker/u16:0-19025 [004] 45981067.173651: kernel_stack: <stack trace>
=> ___preempt_schedule (ffffffffa001ed36)
=> _raw_spin_unlock_bh (ffffffffa0713290)
=> nf_ct_iterate_cleanup (ffffffffc00a4454)
=> nf_conntrack_cleanup_net_list (ffffffffc00a5e1e)
=> nf_conntrack_pernet_exit (ffffffffc00a63dd)
=> ops_exit_list.isra.1 (ffffffffa06075f3)
=> cleanup_net (ffffffffa0607df0)
=> process_one_work (ffffffffa0084c31)
=> worker_thread (ffffffffa008592b)
=> kthread (ffffffffa008bee2)
=> ret_from_fork (ffffffffa071b67c)

Fixes: dd41d33f0b ("openvswitch: Add force commit.")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 21:56:11 -07:00
Arnd Bergmann
16b8b6de32 rocker: fix Wmaybe-uninitialized false-positive
gcc-7 reports a warning that earlier versions did not have:

drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 'ofdpa_port_stp_update':
arch/x86/include/asm/string_32.h:79:22: error: '*((void *)&prev_ctrls+4)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   *((short *)to + 2) = *((short *)from + 2);
   ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/rocker/rocker_ofdpa.c:2218:7: note: '*((void *)&prev_ctrls+4)' was declared here

This is clearly a variation of the warning about 'prev_state' that
was shut up using uninitialized_var().

We can slightly simplify the code and get rid of the warning by unconditionally
saving the prev_state and prev_ctrls variables. The inlined memcpy is not
particularly expensive here, as it just has to read five bytes from one or
two cache lines.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 21:42:32 -07:00
Talat Batheesh
e497ec680c net/mlx5: Avoid dereferencing uninitialized pointer
In NETDEV_CHANGEUPPER event the upper_info field is valid
only when linking is true. Otherwise it should be ignored.

Fixes: 7907f23adc (net/mlx5: Implement RoCE LAG feature)
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 18:07:15 -07:00
Jonas Jensen
c2b341a620 net: moxa: fix TX overrun memory leak
moxart_mac_start_xmit() doesn't care where tx_tail is, tx_head can
catch and pass tx_tail, which is bad because moxart_tx_finished()
isn't guaranteed to catch up on freeing resources from tx_tail.

Add a check in moxart_mac_start_xmit() stopping the queue at the
end of the circular buffer. Also add a check in moxart_tx_finished()
waking the queue if the buffer has TX_WAKE_THRESHOLD or more
free descriptors.

While we're at it, move spin_lock_irq() to happen before our
descriptor pointer is assigned in moxart_mac_start_xmit().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=99451

Signed-off-by: Jonas Jensen <jonas.jensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 18:02:05 -07:00
Arnd Bergmann
af109a2cf6 isdn: kcapi: avoid uninitialized data
gcc-7 points out that the AVMB1_ADDCARD ioctl results in an unintialized
value ending up in the cardnr parameter:

drivers/isdn/capi/kcapi.c: In function 'old_capi_manufacturer':
drivers/isdn/capi/kcapi.c:1042:24: error: 'cdef.cardnr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   cparams.cardnr = cdef.cardnr;

This has been broken since before the start of the git history, so
either the value is not used for anything important, or the ioctl
command doesn't get called in practice.

Setting the cardnr to zero avoids the warning and makes sure
we have consistent behavior.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:59:33 -07:00
Xin Long
f9ba3501d5 sctp: change to save MSG_MORE flag into assoc
David Laight noticed the support for MSG_MORE with datamsg->force_delay
didn't really work as we expected, as the first msg with MSG_MORE set
would always block the following chunks' dequeuing.

This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
David Laight suggested.

asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
All chunks in queue would not be sent out if asoc->force_delay is set
by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag
clears asoc->force_delay.

Note that this change would not affect the flush is generated by other
triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc.

v1->v2:
  Not clear asoc->force_delay after sending the msg with MSG_MORE flag.

Fixes: 4ea0c32f5f ("sctp: add support for MSG_MORE")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Laight <david.laight@aculab.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:56:15 -07:00
Mark Rutland
ffefb6f4d6 net: ipconfig: fix ic_close_devs() use-after-free
Our chosen ic_dev may be anywhere in our list of ic_devs, and we may
free it before attempting to close others. When we compare d->dev and
ic_dev->dev, we're potentially dereferencing memory returned to the
allocator. This causes KASAN to scream for each subsequent ic_dev we
check.

As there's a 1-1 mapping between ic_devs and netdevs, we can instead
compare d and ic_dev directly, which implicitly handles the !ic_dev
case, and avoids the use-after-free. The ic_dev pointer may be stale,
but we will not dereference it.

Original splat:

[    6.487446] ==================================================================
[    6.494693] BUG: KASAN: use-after-free in ic_close_devs+0xc4/0x154 at addr ffff800367efa708
[    6.503013] Read of size 8 by task swapper/0/1
[    6.507452] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc3-00002-gda42158 #8
[    6.514993] Hardware name: AppliedMicro Mustang/Mustang, BIOS 3.05.05-beta_rc Jan 27 2016
[    6.523138] Call trace:
[    6.525590] [<ffff200008094778>] dump_backtrace+0x0/0x570
[    6.530976] [<ffff200008094d08>] show_stack+0x20/0x30
[    6.536017] [<ffff200008bee928>] dump_stack+0x120/0x188
[    6.541231] [<ffff20000856d5e4>] kasan_object_err+0x24/0xa0
[    6.546790] [<ffff20000856d924>] kasan_report_error+0x244/0x738
[    6.552695] [<ffff20000856dfec>] __asan_report_load8_noabort+0x54/0x80
[    6.559204] [<ffff20000aae86ac>] ic_close_devs+0xc4/0x154
[    6.564590] [<ffff20000aaedbac>] ip_auto_config+0x2ed4/0x2f1c
[    6.570321] [<ffff200008084b04>] do_one_initcall+0xcc/0x370
[    6.575882] [<ffff20000aa31de8>] kernel_init_freeable+0x5f8/0x6c4
[    6.581959] [<ffff20000a16df00>] kernel_init+0x18/0x190
[    6.587171] [<ffff200008084710>] ret_from_fork+0x10/0x40
[    6.592468] Object at ffff800367efa700, in cache kmalloc-128 size: 128
[    6.598969] Allocated:
[    6.601324] PID = 1
[    6.603427]  save_stack_trace_tsk+0x0/0x418
[    6.607603]  save_stack_trace+0x20/0x30
[    6.611430]  kasan_kmalloc+0xd8/0x188
[    6.615087]  ip_auto_config+0x8c4/0x2f1c
[    6.619002]  do_one_initcall+0xcc/0x370
[    6.622832]  kernel_init_freeable+0x5f8/0x6c4
[    6.627178]  kernel_init+0x18/0x190
[    6.630660]  ret_from_fork+0x10/0x40
[    6.634223] Freed:
[    6.636233] PID = 1
[    6.638334]  save_stack_trace_tsk+0x0/0x418
[    6.642510]  save_stack_trace+0x20/0x30
[    6.646337]  kasan_slab_free+0x88/0x178
[    6.650167]  kfree+0xb8/0x478
[    6.653131]  ic_close_devs+0x130/0x154
[    6.656875]  ip_auto_config+0x2ed4/0x2f1c
[    6.660875]  do_one_initcall+0xcc/0x370
[    6.664705]  kernel_init_freeable+0x5f8/0x6c4
[    6.669051]  kernel_init+0x18/0x190
[    6.672534]  ret_from_fork+0x10/0x40
[    6.676098] Memory state around the buggy address:
[    6.680880]  ffff800367efa600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    6.688078]  ffff800367efa680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    6.695276] >ffff800367efa700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    6.702469]                       ^
[    6.705952]  ffff800367efa780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[    6.713149]  ffff800367efa800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    6.720343] ==================================================================
[    6.727536] Disabling lock debugging due to kernel taint

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27 21:06:53 -07:00
Florian Fainelli
248ccd5ee6 MAINTAINERS: Add Andrew Lunn as co-maintainer of PHYLIB
Andrew has been contributing a lot to PHYLIB over the past months and
his feedback on patches is more than welcome.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27 16:49:27 -07:00
Gao Feng
75c689dca9 netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
In the commit 93557f53e1 ("netfilter: nf_conntrack: nf_conntrack snmp
helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
snmp_helper is never registered. But it still tries to unregister the
snmp_helper, it could cause the panic.

Now remove the useless snmp_helper and the unregister call in the
error handler.

Fixes: 93557f53e1 ("netfilter: nf_conntrack: nf_conntrack snmp helper")
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27 13:49:13 +02:00
Liping Zhang
9c3f379492 netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister
If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.

But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.

For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.

For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.

I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)

Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
  # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
        Also note, in the userspace we only add a 20s' delay, then
        reinject the syn packet to the kernel:
  # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
  # iptables -F FORWARD
  # rmmod ipt_SYNPROXY
  # rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.

Now you will see the panic like this:
  kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
  Call Trace:
   ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
   init_conntrack+0x12b/0x600 [nf_conntrack]
   nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
   ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
   nf_reinject+0x104/0x270
   nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
   ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
   ? nla_parse+0xa0/0x100
   nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
   [...]

One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.

But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.

Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27 13:47:29 +02:00
Liping Zhang
83d90219a5 netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
The nf_ct_helper_hash table is protected by nf_ct_helper_mutex, while
nfct_helper operation is protected by nfnl_lock(NFNL_SUBSYS_CTHELPER).
So it's possible that one CPU is walking the nf_ct_helper_hash for
cthelper add/get/del, another cpu is doing nf_conntrack_helpers_unregister
at the same time. This is dangrous, and may cause use after free error.

Note, delete operation will flush all cthelpers added via nfnetlink, so
using rcu to do protect is not easy.

Now introduce a dummy list to record all the cthelpers added via
nfnetlink, then we can walk the dummy list instead of walking the
nf_ct_helper_hash. Also, keep nfnl_cthelper_dump_table unchanged, it
may be invoked without nfnl_lock(NFNL_SUBSYS_CTHELPER) held.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27 13:47:29 +02:00
Liping Zhang
3b7dabf029 netfilter: invoke synchronize_rcu after set the _hook_ to NULL
Otherwise, another CPU may access the invalid pointer. For example:
    CPU0                CPU1
     -              rcu_read_lock();
     -              pfunc = _hook_;
  _hook_ = NULL;          -
  mod unload              -
     -                 pfunc(); // invalid, panic
     -             rcu_read_unlock();

So we must call synchronize_rcu() to wait the rcu reader to finish.

Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
by later nf_conntrack_helper_unregister, but I'm inclined to add a
explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
on such obscure assumptions is not a good idea.

Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
remove it too.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-27 13:47:28 +02:00
Alexey Khoroshilov
6ac3b77a6f irda: vlsi_ir: fix check for DMA mapping errors
vlsi_alloc_ring() checks for DMA mapping errors by comparing
returned address with zero, while pci_dma_mapping_error() should be used.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25 20:14:40 -07:00
Arnd Bergmann
834a61d455 net: hns: avoid gcc-7.0.1 warning for uninitialized data
hns_dsaf_set_mac_key() calls dsaf_set_field() on an uninitialized field,
which will then change only a few of its bits, causing a warning with
the latest gcc:

hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_set_mac_uc_entry':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   (origin) &= (~(mask)); \
            ^~
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_set_mac_mc_entry':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_add_mac_mc_port':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_del_mac_entry':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_rm_mac_addr':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_del_mac_mc_port':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_mac_uc_entry':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_mac_mc_entry':
hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The code is actually correct since we always set all 16 bits of the
port_vlan field, but gcc correctly points out that the first
access does contain uninitialized data.

This initializes the field to zero first before setting the
individual bits.

Fixes: 5483bfcb16 ("net: hns: modify tcam table and set mac key")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25 20:05:32 -07:00
Arnd Bergmann
a17f1861b5 net: hns: fix uninitialized data use
When dev_dbg() is enabled, we print uninitialized data, as gcc-7.0.1
now points out:

ethernet/hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_set_promisc_tcam':
ethernet/hisilicon/hns/hns_dsaf_main.c:2947:75: error: 'tbl_tcam_data.low.val' may be used uninitialized in this function [-Werror=maybe-uninitialized]
ethernet/hisilicon/hns/hns_dsaf_main.c:2947:75: error: 'tbl_tcam_data.high.val' may be used uninitialized in this function [-Werror=maybe-uninitialized]

We also pass the data into hns_dsaf_tcam_mc_cfg(), which might later
use it (not sure about that), so it seems safer to just always initialize
the tbl_tcam_data structure.

Fixes: 1f5fa2dd1c ("net: hns: fix for promisc mode in HNS driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25 20:05:32 -07:00
Alexei Starovoitov
b1977682a3 bpf: improve verifier packet range checks
llvm can optimize the 'if (ptr > data_end)' checks to be in the order
slightly different than the original C code which will confuse verifier.
Like:
if (ptr + 16 > data_end)
  return TC_ACT_SHOT;
// may be followed by
if (ptr + 14 > data_end)
  return TC_ACT_SHOT;
while llvm can see that 'ptr' is valid for all 16 bytes,
the verifier could not.
Fix verifier logic to account for such case and add a test.

Reported-by: Huapeng Zhou <hzhou@fb.com>
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 20:51:28 -07:00
Eric Dumazet
43a6684519 ping: implement proper locking
We got a report of yet another bug in ping

http://www.openwall.com/lists/oss-security/2017/03/24/6

->disconnect() is not called with socket lock held.

Fix this by acquiring ping rwlock earlier.

Thanks to Daniel, Alexander and Andrey for letting us know this problem.

Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Jiang <danieljiang0415@gmail.com>
Reported-by: Solar Designer <solar@openwall.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 20:50:28 -07:00
Alexander Duyck
13a8cd191a i40e: Do not enable NAPI on q_vectors that have no rings
When testing the epoll w/ busy poll code I found that I could get into a
state where the i40e driver had q_vectors w/ active NAPI that had no rings.
This was resulting in a divide by zero error.  To correct it I am updating
the driver code so that we only support NAPI on q_vectors that have 1 or
more rings allocated to them.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:28:55 -07:00
Florian Westphal
28ee1b746f secure_seq: downgrade to per-host timestamp offsets
Unfortunately too many devices (not under our control) use tcp_tw_recycle=1,
which depends on timestamps being identical of the same saddr.

Although tcp_tw_recycle got removed in net-next we can't make
such end hosts disappear so downgrade to per-host timestamp offsets.

Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Reported-by: Yvan Vanrossomme <yvan@vanrossomme.net>
Fixes: 95a22caee3 ("tcp: randomize tcp timestamp offsets for each connection")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:27:44 -07:00
Alexander Duyck
95f2552113 net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
This change basically codifies what I think was already the limitations on
the busy_poll and busy_read sysctl interfaces.  We weren't checking the
lower bounds and as such could input negative values. The behavior when
that was used was dependent on the architecture. In order to prevent any
issues with that I am just disabling support for values less than 0 since
this way we don't have to worry about any odd behaviors.

By limiting the sysctl values this way it also makes it consistent with how
we handle the SO_BUSY_POLL socket option since the value appears to be
reported as a signed integer value and negative values are rejected.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 15:02:13 -07:00
Jeff Kirsher
9f47a48e6e Revert "e1000e: driver trying to free already-free irq"
This reverts commit 7e54d9d063.

After additional regression testing, several users are experiencing
kernel panics during shutdown on e1000e devices.  Reverting this
change resolves the issue.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:43:11 -07:00
WANG Cong
a80db69e47 kcm: return immediately after copy_from_user() failure
There is no reason to continue after a copy_from_user()
failure.

Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:13:53 -07:00
Arnd Bergmann
a5af839253 bna: avoid writing uninitialized data into hw registers
The latest gcc-7 snapshot warns about bfa_ioc_send_enable/bfa_ioc_send_disable
writing undefined values into the hardware registers:

drivers/net/ethernet/brocade/bna/bfa_ioc.c: In function 'bfa_iocpf_sm_disabling_entry':
arch/arm/include/asm/io.h:109:22: error: '*((void *)&disable_req+4)' is used uninitialized in this function [-Werror=uninitialized]
arch/arm/include/asm/io.h:109:22: error: '*((void *)&disable_req+8)' is used uninitialized in this function [-Werror=uninitialized]

The two functions look like they should do the same thing, but only one
of them initializes the time stamp and clscode field. The fact that we
only get a warning for one of the two functions seems to be arbitrary,
based on the inlining decisions in the compiler.

To address this, I'm making both functions do the same thing:

- set the clscode from the ioc structure in both
- set the time stamp from ktime_get_real_seconds (which also
  avoids the signed-integer overflow in 2038 and extends the
  well-defined behavior until 2106).
- zero-fill the reserved field

Fixes: 8b230ed8ec ("bna: Brocade 10Gb Ethernet device driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:49:12 -07:00
David S. Miller
1f3466a053 Merge branch 's390-net'
Ursula Braun says:

====================
s390/qeth patches for net

here are 2 s390/qeth patches built for net fixing a problem with AF_IUCV
traffic through HiperSockets.
And we come up with an update for the MAINTAINERS file to establish
Julian as Co-Maintainer for drivers/s390/net and net/iucv.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:40:00 -07:00
Ursula Braun
90b14dc731 MAINTAINERS: add Julian Wiedmann
Add Julian Wiedmann as additional maintainer for drivers/s390/net
and net/iucv.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:40:00 -07:00
Julian Wiedmann
acd9776b5c s390/qeth: no ETH header for outbound AF_IUCV
With AF_IUCV traffic, the skb passed to hard_start_xmit() has a 14 byte
slot at skb->data, intended for an ETH header. qeth_l3_fill_af_iucv_hdr()
fills this ETH header... and then immediately moves it to the
skb's headroom, where it disappears and is never seen again.

But it's still possible for us to return NETDEV_TX_BUSY after the skb has
been modified. Since we didn't get a private copy of the skb, the next
time the skb is delivered to hard_start_xmit() it no longer has the
expected layout (we moved the ETH header to the headroom, so skb->data
now starts at the IUCV_TRANS header). So when qeth_l3_fill_af_iucv_hdr()
does another round of rebuilding, the resulting qeth header ends up
all wrong. On transmission, the buffer is then rejected by
the HiperSockets device with SBALF15 = x'04'.
When this error is passed back to af_iucv as TX_NOTIFY_UNREACHABLE, it
tears down the offending socket.

As the ETH header for AF_IUCV serves no purpose, just align the code to
what we do for IP traffic on L3 HiperSockets: keep the ETH header at
skb->data, and pass down data_offset = ETH_HLEN to qeth_fill_buffer().
When mapping the payload into the SBAL elements, the ETH header is then
stripped off. This avoids the skb manipulations in
qeth_l3_fill_af_iucv_hdr(), and any buffer re-entering hard_start_xmit()
after NETDEV_TX_BUSY is now processed properly.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:40:00 -07:00
Julian Wiedmann
7d969d2e88 s390/qeth: size calculation outbound buffers
Depending on the device type, hard_start_xmit() builds different output
buffer formats. For instance with HiperSockets, on both L2 and L3 we
strip the ETH header from the skb - L3 doesn't need it, and L2 carries
it in the buffer's header element.
For this, we pass data_offset = ETH_HLEN all the way down to
__qeth_fill_buffer(), where skb->data is then adjusted accordingly.
But the initial size calculation still considers the *full* skb length
(including the ETH header). So qeth_get_elements_no() can erroneously
reject a skb as too big, even though it would actually fit into an
output buffer once the ETH header has been trimmed off later.

Fix this by passing an additional offset to qeth_get_elements_no(),
that indicates where in the skb the on-wire data actually begins.
Since the current code uses data_offset=-1 for some special handling
on OSA, we need to clamp data_offset to 0...

On HiperSockets this helps when sending ~MTU-size skbs with weird page
alignment. No change for OSA or AF_IUCV.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:39:59 -07:00
David S. Miller
031b8c6d1d Merge branch 'aquantia-fixes'
Pavel Belous says:

====================
net:ethernet:aquantia: Misc fixes for atlantic driver.

The following patchset containg several fixes for aQuantia AQtion driver
for net tree: A couple fixes for IPv6 and other fixes.

v1->v2: Fix compilation error (using HW_ATL_A0_TXD_CTL_CMD_IPV6 instead
        HW_ATL_B0_TXD_CTL_CMD_IPV6).
v2->v3: Added "Fixes" tags.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:32:20 -07:00
Pavel Belous
5d73bb863c net:ethernet:aquantia: Reset is_gso flag when EOP reached.
We need to reset is_gso flag when EOP reached (entire LSO packet processed).

Fixes: bab6de8fd1 ("net: ethernet: aquantia:
 Atlantic A0 and B0 specific functions.")

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:32:19 -07:00
Pavel Belous
386aff88e3 net:ethernet:aquantia: Fix for LSO with IPv6.
Fix Context Command bit: L3 type = "0" for IPv4, "1" for IPv6.

Fixes: bab6de8fd1 ("net: ethernet: aquantia:
 Atlantic A0 and B0 specific functions.")

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 12:32:19 -07:00