Add some __rcu annotations and use helpers to reduce number of sparse
warnings (CONFIG_SPARSE_RCU_POINTER=y)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
As we own the conntrack and the others can't see it until we confirm it,
we don't need to use atomic bit operation on ct->status.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
In function update_alloc_size(), sizeof(struct nf_ct_ext) is added twice
wrongly.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
I am observing consistent behavior even with bridges, so let's unlock
this. xt_mac is already usable in FORWARD, too. Section 9 of
http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html#section9 says
the MAC source address is changed, but my observation does not match
that claim -- the MAC header is retained.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
[Patrick; code inspection seems to confirm this]
Signed-off-by: Patrick McHardy <kaber@trash.net>
ct->proto is big(60 bytes) due to structure ip_ct_tcp, and we don't need
to initialize the whole for all the other protocols. This patch moves
proto to the end of structure nf_conn, and pushes the initialization down
to the individual protocols.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.
Make that explicit with some helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.
We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).
infiniband case is solved using dst.dev instead of idev->dev
Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.
About 5% speedup on routing test.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is important to move nud_state outside of the often modified cache
line (because of refcnt), to reduce false sharing in neigh_event_send()
This is a followup of commit 0ed8ddf404 (neigh: Protect neigh->ha[]
with a seqlock)
This gives a 7% speedup on routing test with IP route cache disabled.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update vxge driver version
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct issues found by running sparse on the vxge driver, as well as
other miscellaneous cleanups.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update Kconfig to reflect Exar's purchase of Neterion (formerly S2IO).
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The values used to determined if the adapter is running in single or
multi-function mode were previously modified to the values necessary
when making the VXGE_HW_FW_API_GET_FUNC_MODE firmware call. However,
the firmware call was not modified. This had the driver printing out on
probe that the adapter was in multi-function mode when in single
function mode and vice versa.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Detect if the adapter is Titan or Titan1A, and tune the driver for this
hardware. Also, remove unnecessary function __vxge_hw_device_id_get.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate the return code of the call to vxge_hw_vpath_fw_api and
__vxge_hw_vpath_pci_func_mode_get. This enables the proper handling of
error conditions when querying the function mode of the device during
probe.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for enable/disabling hardware timestamping on receive
packets via ioctl call. When enabled, the hardware timestamp replaces
the FCS in the payload.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability in the vxge driver to flash firmware via ethtool.
Updated to include comments from Ben Hutchings.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for multiple callers to access the firmware interface for
the same vpath simultaneously, resulting in uncertain output. Add locks
to serialize access. Also, make functions only accessed locally static,
thus requiring some movement of code blocks.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove all of the unnecessary debug printk indirection and temporary
variables for vxge_debug_ll and vxge_assert.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wait for the receive traffic to become idle before attempting to close
or reset the adapter. To enable the processing of packets while Receive
Idle, move the clearing of __VXGE_STATE_CARD_UP bit in vxge_close to
after it. Also, modify the return value of the ISR when the adapter is
down to IRQ_HANDLED. Otherwise there are unhandled interrupts for the
device.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable RSS hashing and add ability to pass up the adapter calculated rx
hash up the network stack (if feature is available). Add the ability to
enable/disable feature via ethtool, which requires that the adapter is
not running at the time. Other miscellaneous cleanups and fixes
required to get RSS working.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Ram Vepa <ram.vepa@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This completes the implementation of a circular buffer for Ack Vectors, by
extending the current (linear array-based) implementation. The changes are:
(a) An `overflow' flag to deal with the case of overflow. As before, dynamic
growth of the buffer will not be supported; but code will be added to deal
robustly with overflowing Ack Vector buffers.
(b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
which can bring the entire run length calculation completely out of synch.
(This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
ack_vectors/tracking_tail_ackno/ .)
(c) The buffer length is now computed dynamically (i.e. current fill level),
as the span between head to tail.
As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer
necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch
* separates Ack Vector housekeeping code from option-insertion code;
* shifts option-specific code from ackvec.c into options.c;
* introduces a dedicated routine to take care of the Ack Vector records;
* simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant,
since the list is automatically arranged in descending order of ack_seqno.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch brings the Ack Vector interface up to date. Its main purpose is
to lay the basis for the subsequent patches of this set, which will use the
new data structure fields and routines.
There are no real algorithmic changes, rather an adaptation:
(1) Replaced the static Ack Vector size (2) with a #define so that it can
be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
to be sufficient for the moment) and added a solution so that computing
the ECN nonce will continue to work - even with larger Ack Vectors.
(2) Replaced the #defines for Ack Vector states with a complete enum.
(3) Replaced #defines to compute Ack Vector length and state with general
purpose routines (inlines), and updated code to use these.
(4) Added a `tail' field (conversion to circular buffer in subsequent patch).
(5) Updated the (outdated) documentation for Ack Vector struct.
(6) All sequence number containers now trimmed to 48 bits.
(7) Removal of unused bits:
* removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
* removed Elapsed Time for Ack Vectors (it was nowhere used);
* replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
the code needs to be able to remember the old run length;
* reduced the de-/allocation routines (redundant / duplicate tests).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default we add firmware information to ethtool get regs.
Optionally firmware info can instead be sent to log.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coalesce long formats.
Align arguments.
Remove KERN_<level>.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some network drivers use old TX_TIMEOUT definitions, assuming HZ=100 of
old kernels.
Convert these definitions to include HZ, since HZ can be 1000 these
days.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
unix_dgram_poll() is pretty expensive to check POLLOUT status, because
it has to lock the socket to get its peer, take a reference on the peer
to check its receive queue status, and queue another poll_wait on
peer_wait. This all can be avoided if the process calling
unix_dgram_poll() is not interested in POLLOUT status. It makes
unix_dgram_recvmsg() faster by not queueing irrelevant pollers in
peer_wait.
On a test program provided by Alan Crequy :
Before:
real 0m0.211s
user 0m0.000s
sys 0m0.208s
After:
real 0m0.044s
user 0m0.000s
sys 0m0.040s
Suggested-by: Davide Libenzi <davidel@xmailserver.org>
Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alban Crequy reported a problem with connected dgram af_unix sockets and
provided a test program. epoll() would miss to send an EPOLLOUT event
when a thread unqueues a packet from the other peer, making its receive
queue not full.
This is because unix_dgram_poll() fails to call sock_poll_wait(file,
&unix_sk(other)->peer_wait, wait);
if the socket is not writeable at the time epoll_ctl(ADD) is called.
We must call sock_poll_wait(), regardless of 'writable' status, so that
epoll can be notified later of states changes.
Misc: avoids testing twice (sk->sk_shutdown & RCV_SHUTDOWN)
Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to
wakeup only ones interested into writing the socket.
This patch is a specialization of commit 37e5540b3c (epoll keyed
wakeups: make sockets use keyed wakeups).
On a test program provided by Alan Crequy :
Before:
real 0m3.101s
user 0m0.000s
sys 0m6.104s
After:
real 0m0.211s
user 0m0.000s
sys 0m0.208s
Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While tracking dev_base_lock users, I found decnet used it in
dnet_select_source(), but for a wrong purpose:
Writers only hold RTNL, not dev_base_lock, so readers must use RCU if
they cannot use RTNL.
Adds an rcu_head in struct dn_ifaddr and handle proper RCU management.
Adds __rcu annotation in dn_route as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_info_seq_start() uses a read_lock(&dev_base_lock) to make sure
device doesn’t disappear. Same goal can be achieved using RCU.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_base_lock is the legacy way to lock the device list, and is planned
to disappear. (writers hold RTNL, readers hold RCU lock)
Convert aoecmd_cfg_pkts() to RCU locking.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add suspend/resume support using default open/stop interface methods
to do hardware dependant operations.
On suspend, same low power state (soft power mode) will be kept, the
following blocks will be disabled:
- Internal PLL Clock
- Tx/Rx PHY
- MAC
- SPI Interface
Signed-off-by: Abraham Arce <x0066660@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sgs allocation error path leaks the allocated message.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
QDIO is running independent from netdevice state. We are not
allowed to schedule NAPI in case the netdevice is not open.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a certain Hipersockets specific error code in the xmit path, the
qeth driver tries to invoke dev_queue_xmit again.
Commit 79640a4ca6 introduces a busylock
causing locking problems in case of re-invoked dev_queue_xmit by qeth.
This patch removes the attempts to retry packet sending with
dev_queue_xmit from the qeth driver.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fix a bug reported by backyes.
Right the first time pktgen's using queue_map that's not been initialized
by set_cur_queue_map(pkt_dev);
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Backyes <backyes@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
After e6484930d7: net: allocate tx queues in register_netdevice
These calls make net drivers oops at load time, so let's avoid people
git-bisect'ing known problems.
Signed-off-by: Guillaume Chazarain <guichaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After e6484930d7: net: allocate tx queues in register_netdevice
It causes an Oops at skge_probe() time.
Signed-off-by: Guillaume Chazarain <guichaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>