Commit Graph

591183 Commits

Author SHA1 Message Date
Peter Zijlstra 79c9ce57eb perf/core: Fix perf_event_open() vs. execve() race
Jann reported that the ptrace_may_access() check in
find_lively_task_by_vpid() is racy against exec().

Specifically:

  perf_event_open()		execve()

  ptrace_may_access()
				commit_creds()
  ...				if (get_dumpable() != SUID_DUMP_USER)
				  perf_event_exit_task();
  perf_install_in_context()

would result in installing a counter across the creds boundary.

Fix this by wrapping lots of perf_event_open() in cred_guard_mutex.
This should be fine as perf_event_exit_task() is already called with
cred_guard_mutex held, so all perf locks already nest inside it.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:41 +02:00
Adam Borowski 0a25556f84 perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
referenced by filter_events() which expects undefined events to have a
value of 0.

Found via KASAN:

  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
  index 9 is out of range for type 'u64 [9]'
  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
  load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:20:25 +02:00
Ilya Dryomov d3767f0fae rbd: report unsupported features to syslog
... instead of just returning an error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:43 +02:00
Ilya Dryomov 811c668877 rbd: fix rbd map vs notify races
A while ago, commit 9875201e10 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
     [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
     [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
     [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
     [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
     [<ffffffff81107799>] process_one_work+0x689/0x1a80
     [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
     [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
     [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [<ffffffff81108c69>] worker_thread+0xd9/0x1320
     [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
     [<ffffffff8111b02d>] kthread+0x21d/0x2e0
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
     [<ffffffff82022802>] ret_from_fork+0x22/0x40
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
    RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:22 +02:00
Keith Busch 1bdb897039 x86/apic: Handle zero vector gracefully in clear_vector_irq()
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().

The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().

We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.

Remove the BUG_ON and return if vector is zero,

[ tglx: Massaged changelog ]

Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-28 09:53:06 +02:00
David S. Miller 8be2748a40 Merge branch 'socket-space-optimizations'
Eric Dumazet says:

====================
net: avoid some atomic ops when FASYNC is not used

We can avoid some atomic operations on sockets not using FASYNC
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 23:08:41 -04:00
Eric Dumazet 4be735225f net: SOCKWQ_ASYNC_WAITDATA optimizations
SOCKWQ_ASYNC_WAITDATA is set/cleared in sk_wait_data()
and equivalent functions, so that sock_wake_async() can send
a SIGIO only when necessary.

Since these atomic operations are really not needed unless
socket expressed interest in FASYNC, we can omit them in most
cases.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 23:08:40 -04:00
Eric Dumazet 9317bb6982 net: SOCKWQ_ASYNC_NOSPACE optimizations
SOCKWQ_ASYNC_NOSPACE is tested in sock_wake_async()
so that a SIGIO signal is sent when needed.

tcp_sendmsg() clears the bit.
tcp_poll() sets the bit when stream is not writeable.

We can avoid two atomic operations by first checking if socket
is actually interested in the FASYNC business (most sockets in
real applications do not use AIO, but select()/poll()/epoll())

This also removes one cache line miss to access sk->sk_wq->flags
in tcp_sendmsg()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 23:08:40 -04:00
David S. Miller 210732d16d Merge branch 'snmp-stats-update'
Eric Dumazet says:

====================
net: snmp: update SNMP methods

In the old days (before linux-3.0), SNMP counters were duplicated,
one set for user context, and anther one for BH context.

After commit 8f0ea0fe3a ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

This patch series kills the obsolete STATS_USER() helpers,
and rename all XXX_BH() helpers to __XXX() ones, to more
closely match conventions used to update per cpu variables.

This is probably going to hurt maintainers job for a while,
since cherry-picks will not be clean, but this had to be
cleaned at one point. I am so sorry guys.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:25 -04:00
Eric Dumazet 13415e46c5 net: snmp: kill STATS_BH macros
There is nothing related to BH in SNMP counters anymore,
since linux-3.0.

Rename helpers to use __ prefix instead of _BH prefix,
for contexts where preemption is disabled.

This more closely matches convention used to update
percpu variables.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:25 -04:00
Eric Dumazet f3832ed2c2 ipv6: kill ICMP6MSGIN_INC_STATS_BH()
IPv6 ICMP stats are atomics anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:25 -04:00
Eric Dumazet c2005eb010 ipv6: rename IP6_UPD_PO_STATS_BH()
Rename IP6_UPD_PO_STATS_BH() to __IP6_UPD_PO_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:25 -04:00
Eric Dumazet 1d01550359 ipv6: rename IP6_INC_STATS_BH()
Rename IP6_INC_STATS_BH() to __IP6_INC_STATS()
and IP6_ADD_STATS_BH() to __IP6_ADD_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:24 -04:00
Eric Dumazet 02a1d6e7a6 net: rename NET_{ADD|INC}_STATS_BH()
Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:24 -04:00
Eric Dumazet b15084ec7d net: rename IP_UPD_PO_STATS_BH()
Rename IP_UPD_PO_STATS_BH() to __IP_UPD_PO_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:24 -04:00
Eric Dumazet 98f619957e net: rename IP_ADD_STATS_BH()
Rename IP_ADD_STATS_BH() to __IP_ADD_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:24 -04:00
Eric Dumazet a16292a0f0 net: rename ICMP6_INC_STATS_BH()
Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:24 -04:00
Eric Dumazet b45386efa2 net: rename IP_INC_STATS_BH()
Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to
better express this is used in non preemptible context.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet 08e3baef65 net: sctp: rename SCTP_INC_STATS_BH()
Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet 214d3f1f87 net: icmp: rename ICMPMSGIN_INC_STATS_BH()
Remove misleading _BH suffix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet 90bbcc6083 net: tcp: rename TCP_INC_STATS_BH
Rename TCP_INC_STATS_BH() to __TCP_INC_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet b540f9d702 net: xfrm: kill XFRM_INC_STATS_BH()
Not used anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet 02c223470c net: udp: rename UDP_INC_STATS_BH()
Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:23 -04:00
Eric Dumazet 5d3848bc33 net: rename ICMP_INC_STATS_BH()
Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:22 -04:00
Eric Dumazet aa62d76b6e dccp: rename DCCP_INC_STATS_BH()
Rename DCCP_INC_STATS_BH() to __DCCP_INC_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:22 -04:00
Eric Dumazet 6aef70a851 net: snmp: kill various STATS_USER() helpers
In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.

After commit 8f0ea0fe3a ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 22:48:22 -04:00
David S. Miller 2995aea5b6 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-04-27

This series contains updates to i40e and i40evf.

Alex Duyck cleans up the feature flags since they are becoming pretty
"massive", the primary change being that we now build our features list
around hw_encap_features.  Added support for IPIP and SIT offloads,
which should improvement in throughput for IPIP and SIT tunnels with
the offload enabled.

Mitch adds support for configuring RSS on behalf of the VFs, which removes
the burden of dealing with different hardware interfaces from the VF
drivers and improves future compatibility.  Fix to ensure that we do not
panic by checking that the vsi_res pointer is valid before dereferencing
it, after which we can drink beer and eat peanuts.

Shannon does come housekeeping in i40e_add_fdir_ethtool() in preparation
for more cloud filter work.  Added flexibility to the nvmupdate
facility by adding the ability to specify an AQ event opcode to wait on
after Exec_AQ request.

Michal adds device capability which defines if an update is available and
if a security check is needed during the update process.

Kamil just adds a device id to support X722 QSFP+ device.

Greg fixes an issue where a mirror rule ID may be zero, so do not return
invalid parameter when the user passes in a zero for a rule ID.  Adds
support to steer packets to VSIs by VLAN tag alone while being in
promiscuous mode for multicast and unicast MAC addresses.

Jesse fixes the driver from offloading the VLAN tag into the skb any
time there was a VLAN tag and the hardware stripping was enabled, to
making sure it is enabled before put_tag.

v2: Dropped patch 8 ("i40e: Allow user to change input set mask for flow
    director") while Kiran reworks a more generalized solution based
    on feedback from David Miller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 21:59:08 -04:00
Eric Dumazet 501e7ef569 net-rfs: fix false sharing accessing sd->input_queue_head
sd->input_queue_head is incremented for each processed packet
in process_backlog(), and read from other cpus performing
Out Of Order avoidance in get_rps_cpu()

Moving this field in a separate cache line keeps it mostly
hot for the cpu in process_backlog(), as other cpus will
only read it.

In a stress test, process_backlog() was consuming 6.80 % of cpu cycles,
and the patch reduced the cost to 0.65 %

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 21:55:45 -04:00
Akinobu Mita 35ef7d689d net: w5100: support W5500
This adds support for W5500 chip.

W5500 has similar register and memory organization with W5100 and W5200.
There are a few important differences listed below but it is still
possible to share common code with W5100 and W5200.

* W5500 register and memory are organized by multiple blocks.  Each one
is selected by 16bits offset address and 5bits block select bits.

But the existing register access operations take u16 address.  This change
extends the addess by u32 address and put offset address to lower 16bits
and block select bits to upper 16bits.

This change also adds the offset addresses for socket register and TX/RX
memory blocks to the driver private data structure in order to reduce
conditional switches for each chip.

* W5500 has the different register offset for socket interrupt mask
register.  Newly added internal functions w5100_enable_intr() and
w5100_disable_intr() take care of the diffrence.

* W5500 has the different register offset for retry time-value register.
But this register is only used to verify that the reset value is correctly
read at initialization.  So move the verification to w5100_hw_reset()
which already does different things for different chips.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Mike Sinkovsky <msink@permonline.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 21:55:45 -04:00
Leo Yan 15333e3af1 thermal: use %d to print S32 parameters
Power allocator's parameters are S32 type, so use %d to print them.

Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27 15:54:51 -07:00
Leo Yan 5fdfc48bb0 thermal: hisilicon: increase temperature resolution
When calculate temperature, old code firstly do division and then
convert to "millicelsius" unit. This will lose resolution and only can
read back temperature with "Celsius" unit.

So firstly scale step value to "millicelsius" and then do division, so
finally we can increase resolution for temperature value. Also refine
the calculation from temperature value to step value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27 15:54:01 -07:00
David S. Miller 49fa523046 sparc64: Fix bootup regressions on some Kconfig combinations.
The system call tracing bug fix mentioned in the Fixes tag
below increased the amount of assembler code in the sequence
of assembler files included by head_64.S

This caused to total set of code to exceed 0x4000 bytes in
size, which overflows the expression in head_64.S that works
to place swapper_tsb at address 0x408000.

When this is violated, the TSB is not properly aligned, and
also the trap table is not aligned properly either.  All of
this together results in failed boots.

So, do two things:

1) Simplify some code by using ba,a instead of ba/nop to get
   those bytes back.

2) Add a linker script assertion to make sure that if this
   happens again the build will fail.

Fixes: 1a40b95374 ("sparc: Fix system call tracing register handling.")
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Joerg Abraham <joerg.abraham@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 17:27:37 -04:00
David S. Miller 97d601d5de Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Bug fixes for net.

Only use MSIX on VF, and fix rx page buffers on architectures with
PAGE_SIZE >= 64K.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:46 -04:00
Michael Chan 89d0a06c51 bnxt_en: Divide a page into 32K buffers for the aggregation ring if necessary.
If PAGE_SIZE is bigger than BNXT_RX_PAGE_SIZE, that means the native CPU
page is bigger than the maximum length of the RX BD.  Divide the page
into multiple 32K buffers for the aggregation ring.

Add an offset field in the bnxt_sw_rx_agg_bd struct to keep track of the
page offset of each buffer.  Since each page can be referenced by multiple
buffer entries, call get_page() as needed to get the proper reference
count.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:45 -04:00
Michael Chan 2839f28bd5 bnxt_en: Limit RX BD pages to be no bigger than 32K.
The RX BD length field of this device is 16-bit, so the largest buffer
size is 65535.  For LRO and GRO, we allocate native CPU pages for the
aggregation ring buffers.  It won't work if the native CPU page size is
64K or bigger.

We fix this by defining BNXT_RX_PAGE_SIZE to be native CPU page size
up to 32K.  Replace PAGE_SIZE with BNXT_RX_PAGE_SIZE in all appropriate
places related to the rx aggregation ring logic.

The next patch will add additional logic to divide the page into 32K
chunks for aggrgation ring buffers if PAGE_SIZE is bigger than
BNXT_RX_PAGE_SIZE.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:45 -04:00
Michael Chan 1fa72e29e1 bnxt_en: Don't fallback to INTA on VF.
Only MSI-X can be used on a VF.  The driver should fail initialization
if it cannot successfully enable MSI-X.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:45 -04:00
Anjali Singhai Jain 47d3483988 i40evf: Add driver support for promiscuous mode
Add necessary Linux Ethernet driver support for promiscuous mode
operation. Add a flag so the VF knows it is in promiscuous mode
and two state flags to discreetly track multicast and unicast
promiscuous states.

Change-Id: Ib2f2dc7a7582304fec90fc917ebb7ded21ba1de4
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:06:01 -07:00
Anjali Singhai Jain 5676a8b9cd i40e: Add VF promiscuous mode driver support
Add infrastructure for Network Function Virtualization VLAN tagged
packet steering feature.

Change-Id: I9b873d8fcc253858e6baba65ac68ec5b9363944e
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:05:57 -07:00
Greg Rose 6c41a76069 i40e: Add promiscuous on VLAN support
NFV use cases require the ability to steer packets to VSIs by VLAN tag
alone while being in promiscuous mode for multicast and unicast MAC
addresses.  These two new functions support that ability.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:05:53 -07:00
Jesse Brandeburg a149f2c323 i40e/i40evf: Only offload VLAN tag if enabled
The driver was offloading the VLAN tag into the skb
any time there was a VLAN tag and the hardware stripping was
enabled.  Just check to make sure it's enabled before put_tag.

Change-Id: Ife95290c06edd9a616393b38679923938b382241
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:05:47 -07:00
Greg Rose db0772782f i40e: Remove zero check
A mirror rule ID may be zero so do not return invalid parameter when the
user passes in a zero value for a rule ID.

Change-ID: I261b8c24725ce2c6ed32f859da81093dfcbe2970
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:05:41 -07:00
Kamil Krawczyk bccf474435 i40e: Add DeviceID for X722 QSFP+
Change-ID: I1370fbc7774e815ac1ad56561e97488e829592fc
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:05:29 -07:00
Michal Kosiarz 68a1c5a777 i40e: Add device capability which defines if update is available
Add device capability which defines if update is available and security
check is needed during update process.

Change-ID: I380787c878275e1df18b39198df3ee3666342282
Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:01:48 -07:00
David S. Miller c0cc53162a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes in the conflicts.

In the macsec case, the change of the default ID macro
name overlapped with the 64-bit netlink attribute alignment
fixes in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 15:43:10 -04:00
David Ahern 8c14586fc3 net: ipv6: Use passed in table for nexthop lookups
Similar to 3bfd847203 ("net: Use passed in table for nexthop lookups")
for IPv4, if the route spec contains a table id use that to lookup the
next hop first and fall back to a full lookup if it fails (per the fix
4c9bcd1179 ("net: Fix nexthop lookups")).

Example:

    root@kenny:~# ip -6 ro ls table red
    local 2100:1::1 dev lo  proto none  metric 0  pref medium
    2100:1::/120 dev eth1  proto kernel  metric 256  pref medium
    local 2100:2::1 dev lo  proto none  metric 0  pref medium
    2100:2::/120 dev eth2  proto kernel  metric 256  pref medium
    local fe80::e0:f9ff:fe09:3cac dev lo  proto none  metric 0  pref medium
    local fe80::e0:f9ff:fe1c:b974 dev lo  proto none  metric 0  pref medium
    fe80::/64 dev eth1  proto kernel  metric 256  pref medium
    fe80::/64 dev eth2  proto kernel  metric 256  pref medium
    ff00::/8 dev red  metric 256  pref medium
    ff00::/8 dev eth1  metric 256  pref medium
    ff00::/8 dev eth2  metric 256  pref medium
    unreachable default dev lo  metric 240  error -113 pref medium

    root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
    RTNETLINK answers: No route to host

Route add fails even though 2100:1::64 is a reachable next hop:
    root@kenny:~# ping6 -I red  2100:1::64
    ping6: Warning: source address might be selected on device other than red.
    PING 2100:1::64(2100:1::64) from 2100:1::1 red: 56 data bytes
    64 bytes from 2100:1::64: icmp_seq=1 ttl=64 time=1.33 ms

With this patch:
    root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64
    root@kenny:~# ip -6 ro ls table red
    local 2100:1::1 dev lo  proto none  metric 0  pref medium
    2100:1::/120 dev eth1  proto kernel  metric 256  pref medium
    local 2100:2::1 dev lo  proto none  metric 0  pref medium
    2100:2::/120 dev eth2  proto kernel  metric 256  pref medium
    2100:3::/64 via 2100:1::64 dev eth1  metric 1024  pref medium
    local fe80::e0:f9ff:fe09:3cac dev lo  proto none  metric 0  pref medium
    local fe80::e0:f9ff:fe1c:b974 dev lo  proto none  metric 0  pref medium
    fe80::/64 dev eth1  proto kernel  metric 256  pref medium
    fe80::/64 dev eth2  proto kernel  metric 256  pref medium
    ff00::/8 dev red  metric 256  pref medium
    ff00::/8 dev eth1  metric 256  pref medium
    ff00::/8 dev eth2  metric 256  pref medium
    unreachable default dev lo  metric 240  error -113 pref medium

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 15:34:42 -04:00
Linus Torvalds b75a2bf899 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
 "So, it turns out we had a silly bug in the most fundamental part of
  workqueue for a very long time.  AFAICS, this dates back to pre-git
  era and has quite likely been there from the time workqueue was first
  introduced.

  A work item uses its PENDING bit to synchronize multiple queuers.
  Anyone who wins the PENDING bit owns the pending state of the work
  item.  Whether a queuer wins or loses the race, one thing should be
  guaranteed - there will soon be at least one execution of the work
  item - where "after" means that the execution instance would be able
  to see all the changes that the queuer has made prior to the queueing
  attempt.

  Unfortunately, we were missing a smp_mb() after clearing PENDING for
  execution, so nothing guaranteed visibility of the changes that a
  queueing loser has made, which manifested as a reproducible blk-mq
  stall.

  Lots of kudos to Roman for debugging the problem.  The patch for
  -stable is the minimal one.  For v3.7, Peter is working on a patch to
  make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix ghost PENDING flag while doing MQ IO
2016-04-27 12:03:59 -07:00
Linus Torvalds 763cfc86ee Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two patches to fix a deadlock which can be easily triggered if memcg
  charge moving is used.

  This bug was introduced while converting threadgroup locking to a
  global percpu_rwsem and is caused by cgroup controller task migration
  path depending on the ability to create new kthreads.  cpuset had a
  similar issue which was fixed by performing heavy-lifting operations
  asynchronous to task migration.  The two patches fix the same issue in
  memcg in a similar way.  The first patch makes the mechanism generic
  and the second relocates memcg charge moving outside the migration
  path.

  Given that we don't want to perform heavy operations while
  writelocking threadgroup lock anyway, moving them out of the way is a
  desirable solution.  One thing to note is that the problem was
  difficult to debug because lockdep couldn't figure out the deadlock
  condition.  Looking into how to improve that"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  memcg: relocate charge moving from ->attach to ->post_attach
  cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
2016-04-27 11:41:14 -07:00
Linus Torvalds 3118e5f966 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID'
  patches"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
  i2c: cpm: Fix build break due to incompatible pointer types
  i2c: ismt: Add Intel DNV PCI ID
  i2c: xlp9xx: add support for Broadcom Vulcan
  i2c: rk3x: add support for rk3228
2016-04-27 11:34:45 -07:00
Nicolas Dichtel 570d8e9398 taskstats: fix nl parsing in accounting/getdelays.c
The type TASKSTATS_TYPE_NULL should always be ignored.

When jumping to the next attribute, only the length of the current
attribute should be added, not the length of all nested attributes.
This last bug was not visible before commit 80df554275, because the
kernel didn't put more than two nested attributes.

Fixes: a3baf649ca ("[PATCH] per-task-delay-accounting: documentation")
Fixes: 80df554275 ("taskstats: use the libnl API to align nlattr on 64-bit")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 12:50:14 -04:00
Linus Torvalds 24131a61ec ARC fixes for 4.6-rc6
- LOCKDEP now words for ARCv2 builds
  - Enabling DT reserved-memory binding to work (for forthcoming HDMI driver)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIMkcAAoJEGnX8d3iisJeHdkP/1Eb+V6asWOPVGinbdKs0nhs
 xdnFiBVesgdfXKBxY/K1GN6bVNUqaPNVRk7Y1fWFzS6O2tgVMMDFz+4FebU6sDMv
 wiQuxqzEYhyaiqNvw2JUH60Y5GiCjWFLMfgR2FKCkUM99NZm/2DFos2hPE81K2yc
 4JfEQoPcsuYYN6nheMKa0sjomGHg6qIL4wi3uvB3RCrbqs1MbauKpsbdbNF3thqZ
 xpeHavHLRLUTnN/lIf7Z1SwSh6S0Ey7YFLePxsC48vZCL0a8L1HfFSfvjcxuHBMU
 cGsRXWQmwVjPUeEC9JIPcDkbEfQ1nlezU8lZcg7PJHOt4DByxN3sCngAPKt6Skls
 I2Ql5tP12IUmuWv4zpM7VP/ZZvC5EOh4RmG2xQwuV+rtDilkYHZrUPx7PdilRt3S
 a+A+FoYMgczdTTSCJnI0kJYADmPtz/6e1N9rSyzzmVnDmSPR9hClO9dENtpSwiXD
 Jtpo4tBsMLyw6+Oj68e70c58t8ek9PObR1lIqZ3zJ97hvgACjjpeDytVjv7oPpYH
 scTUfx69s6qF96Wn+y44Iw1gRV896UFlLmHtX9Hk0BtuVdjZuwOIF1Kqfsk6SYsJ
 0WFdnxoNJHPJVWkp+Pmrz7g8BaUxfAtc7Ly6C+8yUUa1nQswYQmrK+84uKLVjiWl
 E2sBDIJl2Xu2E7amTJss
 =rlDa
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - lockdep now works for ARCv2 builds

 - enable DT reserved-memory binding (for forthcoming HDMI driver)

* tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: add support for reserved memory defined by device tree
  ARC: support generic per-device coherent dma mem
  Documentation: dt: arc: fix spelling mistakes
  ARCv2: Enable LOCKDEP
2016-04-27 09:46:21 -07:00