Commit graph

9540 commits

Author SHA1 Message Date
Pavel Emelyanov
923c6586b0 mib: put icmpmsg statistics on struct net
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:04:22 -07:00
Pavel Emelyanov
b60538a0d7 mib: put icmp statistics on struct net
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:04:02 -07:00
Pavel Emelyanov
386019d351 mib: put udplite statistics on struct net
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:03:45 -07:00
Pavel Emelyanov
2f275f91a4 mib: put udp statistics on struct net
Similar to... ouch, I repeat myself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:03:27 -07:00
Pavel Emelyanov
61a7e26028 mib: put net statistics on struct net
Similar to ip and tcp ones :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:03:08 -07:00
Pavel Emelyanov
a20f5799ca mib: put ip statistics on struct net
Similar to tcp one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:02:42 -07:00
Pavel Emelyanov
57ef42d59d mib: put tcp statistics on struct net
Proc temporary uses stats from init_net.

BTW, TCP_XXX_STATS are beautiful (w/o do { } while (0) facing) again :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:02:08 -07:00
Pavel Emelyanov
9b4661bd6e ipv4: add pernet mib operations
These ones are currently empty, but stuff from init_ipv4_mibs will
sequentially migrate there.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 04:01:44 -07:00
David S. Miller
49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
David S. Miller
a0c80b80e0 pkt_sched: Make default qdisc nonshared-multiqueue safe.
Instead of 'pfifo_fast' we have just plain 'fifo_fast'.
No priority queues, just a straight FIFO.

This is necessary in order to legally have a seperate
qdisc per queue in multi-TX-queue setups, and thus get
full parallelization.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:33 -07:00
David S. Miller
99194cff39 pkt_sched: Add multiqueue handling to qdisc_graft().
Move the destruction of the old queue into qdisc_graft().

When operating on a root qdisc (ie. "parent == NULL"), apply
the operation to all queues.  The caller has grabbed a single
implicit reference for this graft, therefore when we apply the
change to more than one queue we must grab additional qdisc
references.

Otherwise, we are operating on a class of a specific parent qdisc, and
therefore no multiqueue handling is necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:30 -07:00
David S. Miller
8387400092 pkt_sched: Kill netdev_queue lock.
We can simply use the qdisc->q.lock for all of the
qdisc tree synchronization.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:30 -07:00
David S. Miller
c7e4f3bbb4 pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:29 -07:00
David S. Miller
53049978df pkt_sched: Make qdisc grafting locking more specific.
Lock the root of the qdisc being operated upon.

All explicit references to qdisc_tree_lock() are now gone.
The only remaining uses are via the sch_tree_{lock,unlock}()
and tcf_tree_{lock,unlock}() macros.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:27 -07:00
David S. Miller
ead81cc5fc netdevice: Move qdisc_list back into net_device proper.
And give it it's own lock.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:26 -07:00
David S. Miller
15b458fa65 pkt_sched: Kill qdisc_lock_tree usage in cls_route.c
It just wants the qdisc tree to be synchronized, so grabbing
qdisc_root_lock() is sufficient.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:25 -07:00
David S. Miller
55dbc640c3 pkt_sched: Remove qdisc_lock_tree usage in cls_api.c
It just wants the qdisc tree for the filter to be synchronized.
So just BH lock qdisc_root_lock(q) instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:24 -07:00
David S. Miller
17715e62a5 pkt_sched: Use per-queue locking in shutdown_scheduler_queue.
This eliminates another qdisc_lock_tree user.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:23 -07:00
David S. Miller
8a34c5dc3a pkt_sched: Perform bulk of qdisc destruction in RCU.
This allows less strict control of access to the qdisc attached to a
netdev_queue.  It is even allowed to enqueue into a qdisc which is
in the process of being destroyed.  The RCU handler will toss out
those packets.

We will need this to handle sharing of a qdisc amongst multiple
TX queues.  In such a setup the lock has to be shared, so will
be inside of the qdisc itself.  At which point the netdev_queue
lock cannot be used to hard synchronize access to the ->qdisc
pointer.

One operation we have to keep inside of qdisc_destroy() is the list
deletion.  It is the only piece of state visible after the RCU quiesce
period, so we have to undo it early and under the appropriate locking.

The operations in the RCU handler do not need any looking because the
qdisc tree is no longer visible to anything at that point.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:22 -07:00
David S. Miller
16361127eb pkt_sched: dev_init_scheduler() does not need to lock qdisc tree.
We are registering the device, there is no way anyone can get
at this object's qdiscs yet in any meaningful way.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:21 -07:00
David S. Miller
37437bb2e1 pkt_sched: Schedule qdiscs instead of netdev_queue.
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:20 -07:00
David S. Miller
7698b4fcab pkt_sched: Add and use qdisc_root() and qdisc_root_lock().
When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.

Add qdisc_root_lock() to represent this and convert the
easiest cases.

In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:19 -07:00
David S. Miller
e2627c8c22 pkt_sched: Make QDISC_RUNNING a qdisc state.
Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:18 -07:00
David S. Miller
d3b753db7c pkt_sched: Move gso_skb into Qdisc.
We liberate any dangling gso_skb during qdisc destruction.

It really only matters for the root qdisc.  But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:18 -07:00
David S. Miller
8f0f2223cc net: Implement simple sw TX hashing.
It just xor hashes over IPv4/IPv6 addresses and ports of transport.

The only assumption it makes is that skb_network_header() is set
correctly.

With bug fixes from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:13 -07:00
David S. Miller
51cb6db0f5 mac80211: Reimplement WME using ->select_queue().
The only behavior change is that we do not drop packets under any
circumstances.  If that is absolutely needed, we could easily add it
back.

With cleanups and help from Johannes Berg.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:12 -07:00
David S. Miller
eae792b722 netdev: Add netdev->select_queue() method.
Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().

This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.

This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:10 -07:00
David S. Miller
fd2ea0a79f net: Use queue aware tests throughout.
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.

Non-multiqueue drivers need no changes.  The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.

Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.

pktgen and netpoll required a little bit more surgery than the others.

In particular the pktgen changes, whilst functional, could be largely
improved.  The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless.  The thing to do is probably to
invoke fill_packet() earlier.

The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.

Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.

Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.

With IGB changes from Jeff Kirsher.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:07 -07:00
David S. Miller
24344d2600 mac80211: Temporarily mark QoS support BROKEN.
We will undo this after a few changsets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:05 -07:00
David S. Miller
1d8ae3fdeb pkt_sched: Remove RR scheduler.
This actually fixes a bug added by the RR scheduler changes.  The
->bands and ->prio2band parameters were being set outside of the
sch_tree_lock() and thus could result in strange behavior and
inconsistencies.

It might be possible, in the new design (where there will be one qdisc
per device TX queue) to allow similar functionality via a TX hash
algorithm for RR but I really see no reason to export this aspect of
how these multiqueue cards actually implement the scheduling of the
the individual DMA TX rings and the single physical MAC/PHY port.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:04 -07:00
David S. Miller
09e83b5d7d netdev: Kill NETIF_F_MULTI_QUEUE.
There is no need for a feature bit for something that
can be tested by simply checking the TX queue count.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:03 -07:00
David S. Miller
e8a0464cc9 netdev: Allocate multiple queues for TX.
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 19:21:00 -07:00
Patrick McHardy
51ce7ec921 garp: retry sending JoinIn messages after allocation failures
Increase reliability by retrying to send JoinIn messages after memory
allocation failures on each TRANSMIT_PDU event until it succeeds.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:51:47 -07:00
Neil Horman
9a6d276e85 core: add stat to track unresolved discards in neighbor cache
in __neigh_event_send, if we have a neighbour entry which is in
NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour
to the neighbours arp_queue, which is default capped to a length of 3
skbs.  If that queue exceeds its set length, it will drop an skb on
the queue to enqueue the newly arrived skb.  This results in a drop
for which we have no statistics incremented.  This patch adds an
unresolved_discards stat to /proc/net/stat/ndisc_cache to track these
lost frames.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:50:49 -07:00
Pavel Emelyanov
ed88098e25 mib: add net to NET_ADD_STATS_USER
Done with NET_XXX_STATS macros :)

To be continued...

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:32:45 -07:00
Pavel Emelyanov
f2bf415cfe mib: add net to NET_ADD_STATS_BH
This one is tricky. 

The thing is that this macro is only used when killing tw buckets, 
but since this killer is promiscuous wrt to which net each particular
tw belongs to, I have to use it only when NET_NS is off. When the net
namespaces are on, I use the INET_INC_STATS_BH for each bucket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:32:25 -07:00
Pavel Emelyanov
6f67c817fc mib: add net to NET_INC_STATS_USER
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:31:39 -07:00
Pavel Emelyanov
de0744af1f mib: add net to NET_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:31:16 -07:00
Pavel Emelyanov
4e6734447d mib: add net to NET_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:30:14 -07:00
Pavel Emelyanov
1ed834655a tcp: replace tcp_sock argument with sock in some places
These places have a tcp_sock, but we'd prefer the sock itself to
get net from it. Fortunately, tcp_sk macro is just a type cast, so
this replace is really cheap.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:29:51 -07:00
Pavel Emelyanov
ca12a1a443 inet: prepare net on the stack for NET accounting macros
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:28:42 -07:00
Pavel Emelyanov
5c52ba170f sock: add net to prot->enter_memory_pressure callback
The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
have where to get the net from.

I decided to add a sk argument, not the net itself, only to factor
all the required sock_net(sk) calls inside the enter_memory_pressure 
callback itself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:28:10 -07:00
Pavel Emelyanov
74688e487a mib: add net to TCP_DEC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:46 -07:00
Pavel Emelyanov
63231bddf6 mib: add net to TCP_INC_STATS_BH
Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on 
the stack.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:25 -07:00
Pavel Emelyanov
81cc8a75d9 mib: add net to TCP_INC_STATS
Fortunately (almost) all the TCP code has a sock to get the net from :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:22:04 -07:00
Pavel Emelyanov
a9c19329ec tcp: add net to tcp_mib_init
This one sets TCP MIBs after zeroing them, and thus requires
the net.

The existing single caller can use init_net (temporarily).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:21:42 -07:00
Pavel Emelyanov
a86b1e3019 inet: prepare struct net for TCP MIB accounting
This is the same as the first patch in the set, but preparing
the net for TCP_XXX_STATS - save the struct net on the stack
where required and possible.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:20:58 -07:00
Pavel Emelyanov
c5346fe396 mib: add net to IP_ADD_STATS_BH
Very simple - only ip_evictor (fragments) requires such.
This patch ends up the IP_XXX_STATS patching.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:20:33 -07:00
Pavel Emelyanov
7c73a6faff mib: add net to IP_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:20:11 -07:00
Pavel Emelyanov
5e38e27044 mib: add net to IP_INC_STATS
All the callers already have either the net itself, or the place
where to get it from.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:19:49 -07:00
Pavel Emelyanov
84a3aa000e ipv4: prepare net initialization for IP accounting
Some places, that deal with IP statistics already have where to
get a struct net from, but use it directly, without declaring
a separate variable on the stack.

So, save this net on the stack for future IP_XXX_STATS macros.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:19:08 -07:00
Will Newton
70efce27fc net/ipv4/tcp.c: Fix use of PULLHUP instead of POLLHUP in comments.
Change PULLHUP to POLLHUP in tcp_poll comments and clean up another
comment for grammar and coding style.

Signed-off-by: Will Newton <will.newton@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:13:43 -07:00
Harvey Harrison
7b1c65faa2 net: make __skb_splice_bits static
net/core/skbuff.c:1335:5: warning: symbol '__skb_splice_bits' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:12:30 -07:00
David S. Miller
885a4c966b Merge branch 'stealer/ipvs/sync-daemon-cleanup-for-next' of git://git.stealer.net/linux-2.6 2008-07-16 20:07:06 -07:00
Rumen G. Bogdanovski
9d3a0de7dc ipvs: More reliable synchronization on connection close
This patch enhances the synchronization of the closing connections
between the master and the backup director. It prevents the closed
connections to expire with the 15 min timeout of the ESTABLISHED
state on the backup and makes them expire as they would do on the
master with much shorter timeouts.

Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-16 20:04:23 -07:00
Sven Wegener
375c6bbabf ipvs: Use schedule_timeout_interruptible() instead of msleep_interruptible()
So that kthread_stop() can wake up the thread and we don't have to wait one
second in the worst case for the daemon to actually stop.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-07-16 22:33:20 +00:00
Sven Wegener
ba6fd85021 ipvs: Put backup thread on mcast socket wait queue
Instead of doing an endless loop with sleeping for one second, we now put the
backup thread onto the mcast socket wait queue and it gets woken up as soon as
we have data to process.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-07-16 22:33:20 +00:00
Sven Wegener
998e7a7680 ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()
This also moves the setup code out of the daemons, so that we're able to
return proper error codes to user space. The current code will return success
to user space when the daemon is started with an invald mcast interface. With
these changes we get an appropriate "No such device" error.

We longer need our own completion to be sure the daemons are actually running,
because they no longer contain code that can fail and kthread_run() takes care
of the rest.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-07-16 22:33:20 +00:00
Sven Wegener
e6dd731c75 ipvs: Use ERR_PTR for returning errors from make_receive_sock() and make_send_sock()
The additional information we now return to the caller is currently not used,
but will be used to return errors to user space.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-07-16 22:33:19 +00:00
Sven Wegener
d56400504a ipvs: Initialize mcast addr at compile time
There's no need to do it at runtime, the values are constant.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
2008-07-16 22:33:19 +00:00
Trond Myklebust
cadc723cc1 Merge branch 'bkl-removal' into next 2008-07-15 18:34:58 -04:00
Trond Myklebust
e89e896d31 Merge branch 'devel' into next
Conflicts:

	fs/nfs/file.c

Fix up the conflict with Jon Corbet's bkl-removal tree
2008-07-15 18:34:16 -04:00
Trond Myklebust
a86dc496b7 SUNRPC: Remove the BKL from the callback functions
Push it into those callback functions that actually need it.

Note that all the NFS operations use their own locking, so don't need the
BKL. Ditto for the rpcbind client.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:57 -04:00
Chuck Lever
c2e1b09ff2 SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.

Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon.  The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP.  All of this functionality is exposed via
the new rpcb_v4_register() kernel API.

A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.

Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:55 -04:00
Chuck Lever
babe80eb49 SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
rpcbind version 4 registration will reuse part of rpcb_register, so just
split it out into a separate function now.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:51 -04:00
Chuck Lever
423d8b0647 SUNRPC: None of rpcb_create's callers wants a privileged source port
Clean up: Callers that required a privileged source port now use
rpcb_create_local(), so we can remove the @privileged argument from
rpcb_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:47 -04:00
Chuck Lever
cc5598b78f SUNRPC: Introduce a specific rpcb_create for contacting localhost
Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6
registration functions.

Ensure any errors encountered by rpcb_create_local() are properly
reported.

We can also use a statically allocated constant loopback socket address
instead of one allocated on the stack and initialized every time the
function is called.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:44 -04:00
Chuck Lever
166b88d755 SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
The rpcbind versions 3 and 4 SET and UNSET procedures use the same
arguments as the GETADDR procedure.

While definitely a bug, this hasn't been a problem so far since the
kernel hasn't used version 3 or 4 SET and UNSET.  But this will change
in just a moment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:08:40 -04:00
Linus Torvalds
59190f4213 Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  generic-ipi: more merge fallout
  generic-ipi: merge fix
  x86, visws: use mach-default/entry_arch.h
  x86, visws: fix generic-ipi build
  generic-ipi: fixlet
  generic-ipi: fix s390 build bug
  generic-ipi: fix linux-next tree build failure
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
  fix "smp_call_function: get rid of the unused nonatomic/retry argument"
  on_each_cpu(): kill unused 'retry' parameter
  smp_call_function: get rid of the unused nonatomic/retry argument
  sh: convert to generic helpers for IPI function calls
  parisc: convert to generic helpers for IPI function calls
  mips: convert to generic helpers for IPI function calls
  m32r: convert to generic helpers for IPI function calls
  arm: convert to generic helpers for IPI function calls
  alpha: convert to generic helpers for IPI function calls
  ia64: convert to generic helpers for IPI function calls
  powerpc: convert to generic helpers for IPI function calls
  ...

Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
2008-07-15 14:12:03 -07:00
Ingo Molnar
1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Ingo Molnar
6c9fcaf2ee Merge branch 'core/rcu' into core/rcu-for-linus 2008-07-15 21:10:12 +02:00
Akinobu Mita
d0236f8f82 iucv: fix memory leak in cpu hotplug error path.
Fix memory leak in error path in CPU_UP_PREPARE notifier.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 02:09:53 -07:00
Octavian Purdila
2870c43d17 net: refactor tcp splice receive path to improve readability
- move all of the details on offsets, lengths and buffers into a
single function instead of doing these operation from multiple places

- use a bottom up approach: try to avoid details in the high level
functions, introduce them gradually as we go deeper in the function
call stack

With helpful feedback from Jarek Poplawski.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:49:11 -07:00
David S. Miller
b9e4085768 netdev: Do not use TX lock to protect address lists.
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:15:08 -07:00
David S. Miller
e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
David S. Miller
f1f28aa351 netdev: Add addr_list_lock to struct net_device.
This will be used to protect the per-device unicast and multicast
address lists, as well as the callbacks into the drivers which
configure such state such as ->set_rx_mode() and ->set_multicast_list().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:08:33 -07:00
Pavel Emelyanov
f66ac03d49 mib: add struct net to ICMPMSGIN_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:31 -07:00
Pavel Emelyanov
903fc1964e mib: add struct net to ICMPMSGOUT_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:30 -07:00
Pavel Emelyanov
dcfc23cac1 mib: add struct net to ICMP_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:29 -07:00
Pavel Emelyanov
75c939bb4d mib: add struct net to ICMP_INC_STATS
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:28 -07:00
Pavel Emelyanov
fd54d716b1 inet: toss struct net initialization around
Some places, that deal with ICMP statistics already have where
to get a struct net from, but use it directly, without declaring
a separate variable on the stack.

Since I will need this net soon, I declare a struct net on the
stack and use it in the existing places in a separate patch not
to spoil the future ones.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:26 -07:00
Pavel Emelyanov
0388b00426 icmp: add struct net argument to icmp_out_count
This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 23:05:13 -07:00
Patrick McHardy
61362766d7 vlan: remove unnecessary include statements
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:55 -07:00
Patrick McHardy
d80aa31bbf vlan: clean up hard_start_xmit functions
Remove excessive comments and debugging, use NETDEV_TX codes,
remove some empty lines.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:39 -07:00
Patrick McHardy
1349fe9a6b vlan: clean up vlan_dev_hard_header()
Remove some debugging and excessive comments, merge the two
dev_hard_header calls into one.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:19 -07:00
Patrick McHardy
19b9a4e256 vlan: ethtool ->get_flags support
Allow to query LRO settings of underlying device when VLAN RX
acceleration is used.

Suggested by Ben Hutchings <bhutchings@solarflare.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:51:01 -07:00
Patrick McHardy
393e52e33c packet: deliver VLAN TCI to userspace
Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can
properly deal with hardware VLAN tagging/stripping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:50:39 -07:00
Patrick McHardy
bbd6ef87c5 packet: support extensible, 64 bit clean mmaped ring structure
The tpacket_hdr is not 64 bit clean due to use of an unsigned long
and can't be extended because the following struct sockaddr_ll needs
to be at a fixed offset.

Add support for a version 2 tpacket protocol that removes these
limitations.

Userspace can query the header size through a new getsockopt option
and change the protocol version through a setsockopt option. The
changes needed to switch to the new protocol version are:

1. replace struct tpacket_hdr by struct tpacket2_hdr
2. query header len and save
3. set protocol version to 2
 - set up ring as usual
4. for getting the sockaddr_ll, use (void *)hdr + TPACKET_ALIGN(hdrlen)
   instead of (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))

Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:50:15 -07:00
Patrick McHardy
bc1d0411b8 vlan: deliver packets received with VLAN acceleration to network taps
When VLAN header stripping is used, packets currently bypass packet
sockets (and other network taps) completely. For locally existing
VLANs, they appear directly on the VLAN device, for unknown VLANs
they are silently dropped.

Add a new function netif_nit_deliver() to deliver incoming packets
to all network interface taps and use it in __vlan_hwaccel_rx() to
make VLAN packets visible on the underlying device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:49:30 -07:00
Patrick McHardy
6aa895b047 vlan: Don't store VLAN tag in cb
Use a real skb member to store the skb to avoid clashes with qdiscs,
which are allowed to use the cb area themselves. As currently only real
devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
invalidation is neccessary.

The new member fills a hole on 64 bit, the skb layout changes from:

        __u32                      mark;                 /*   172     4 */
        sk_buff_data_t             transport_header;     /*   176     4 */
        sk_buff_data_t             network_header;       /*   180     4 */
        sk_buff_data_t             mac_header;           /*   184     4 */
        sk_buff_data_t             tail;                 /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        sk_buff_data_t             end;                  /*   192     4 */

        /* XXX 4 bytes hole, try to pack */

to

        __u32                      mark;                 /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */

        /* XXX 2 bytes hole, try to pack */

        sk_buff_data_t             transport_header;     /*   180     4 */
        sk_buff_data_t             network_header;       /*   184     4 */

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:49:06 -07:00
Allan Stephens
968edbe1c8 tipc: Optimization to multicast name lookup algorithm
This patch simplifies and speeds up TIPC's algorithm for identifying
on-node and off-node destinations that overlap a multicast name
sequence range.  Rather than traversing the list of all known name
publications within the cluster, it now traverses the (potentially
much shorter) list of name publications made by the node itself, and
determines if any off-node destinations exist by comparing the sizes
of the two lists.  (Since the node list must be a subset of the
cluster list, a difference in sizes means that at least one off-node
destination must exist.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:45:33 -07:00
Allan Stephens
1aad72d6cd tipc: Add missing locks when inspecting node list & link list
This patch ensures that TIPC configuration commands that display info
about neighboring nodes and their links take the spinlocks that
protect the node list and link lists from changing while the lists
are being traversed.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:44:58 -07:00
Allan Stephens
08d2cf0f74 tipc: Fix bug in scope checking for multicast messages
This patch ensures that TIPC's multicast message name lookup
algorithm does individualized scope checking for each published
name it examines.  Previously, scope checking was only done for
the first name in a name table publication list, which could
result in incoming multicast messages being delivered to ports
publishing names with "node" scope, or not being delivered to
ports publishing names with "cluster" or "zone" scope.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:44:32 -07:00
Allan Stephens
0e35fd5e52 tipc: Eliminate improper use of TIPC_OK error code
This patch corrects many places where TIPC routines indicated
successful completion by returning TIPC_OK instead of 0.
(The TIPC_OK symbol has the value 0, but it should only be used
in contexts that deal with the error code field of a TIPC
message header.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:44:01 -07:00
Allan Stephens
2da59918e2 tipc: Fix race condition that could cause accept() to fail
This patch ensurs that accept() returns successfully even when
the newly created socket is immediately disconnected by its peer.
Previously, accept() would fail if it was unable to pass back
the optional address info for the socket's peer before the
socket became disconnected; TIPC now allows accept() to gather
peer address information after disconnection.  As a bonus, the
revised code accesses the socket's port more efficiently, without
the overhead incurred by a reference table lookup.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:43:32 -07:00
Allan Stephens
8642bd9e04 tipc: Optimize pointer dereferencing when receiving stream data
This patch eliminates an unnecessary pointer dereference when
accessing a stream-based socket's receive queue.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:42:51 -07:00
Allan Stephens
0ea522416b tipc: Remove unneeded parameter to tipc_createport_raw()
This patch eliminates an unneeded parameter when creating a low-level
TIPC port object.  Instead of returning both the pointer to the port
structure and the port's reference ID, it now returns only the pointer
since the port structure contains the reference ID as one of its fields.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:42:19 -07:00
Denis V. Lunev
83aa2e964b netlabel: return msg overflow error from netlbl_cipsov4_list faster
Currently, we are trying to place the information from the kernel to
1, 2, 3 and 4 pages sequentially. These pages are allocated via slab.
Though, from the slab point of view steps 3 and 4 are equivalent on
most architectures. So, lets skip 3 pages attempt.

By the way, should we switch from .doit to .dumpit interface here?
The amount of data seems quite big for me.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:28:25 -07:00
Johann Felix Soden
7197914c35 net: Remove references to wan-router.txt in Kconfigs
This patch removes references in drivers/net/wan/Kconfig and
net/wanrouter/Kconfig to Documentation/networking/wan-router.txt
which was removed in commit 99971e70fd
("[WANPIPE]: Forgotten bits of Sangoma drivers removal.").

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 22:22:29 -07:00
Wang Chen
89146504cb 8021q: Check return of dev_set_promiscuity/allmulti
dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check all positive increment for promiscuity and allmulti
to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-14 20:59:03 -07:00