linux-stable

Commit Graph

Author	SHA1	Message	Date
David S. Miller	d3b753db7c	pkt_sched: Move gso_skb into Qdisc. We liberate any dangling gso_skb during qdisc destruction. It really only matters for the root qdisc. But when qdiscs can be shared by multiple netdev_queue objects, we can't have the gso_skb in the netdev_queue any more. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	fd2ea0a79f	net: Use queue aware tests throughout. This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:07 -07:00
David S. Miller	e8a0464cc9	netdev: Allocate multiple queues for TX. alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:00 -07:00
David S. Miller	79d16385c7	netdev: Move atomic queue state bits into netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:14:46 -07:00
David S. Miller	c773e847ea	netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue. Accesses are mostly structured such that when there are multiple TX queues the code transformations will be a little bit simpler. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:13:53 -07:00
David S. Miller	eb6aafe3f8	pkt_sched: Make qdisc_run take a netdev_queue. This allows us to use this calling convention all the way down into qdisc_restart(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:12:38 -07:00
David S. Miller	86d804e10a	netdev: Make netif_schedule() routines work with netdev_queue objects. Only plain netif_schedule() remains taking a net_device, mostly as a compatability item while we transition the rest of these interfaces. Everything else calls netif_schedule_queue() or __netif_schedule(), both of which take a netdev_queue pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:11:25 -07:00
David S. Miller	970565bbad	netdev: Move gso_skb into netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:10:33 -07:00
David S. Miller	68dfb42798	pkt_sched: Kill stats_lock member of struct Qdisc. It is always equal to qdisc->dev_queue->lock Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:57:31 -07:00
David S. Miller	b0e1e6462d	netdev: Move rest of qdisc state into struct netdev_queue Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:42:10 -07:00
David S. Miller	555353cfa1	netdev: The ingress_lock member is no longer needed. Every qdisc is assosciated with a queue, and in the case of ingress qdiscs that will now be netdev->rx_queue so using that queue's lock is the thing to do. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:33:13 -07:00
David S. Miller	dc2b48475a	netdev: Move queue_lock into struct netdev_queue. The lock is now an attribute of the device queue. One thing to notice is that "suspicious" places emerge which will need specific training about multiple queue handling. They are so marked with explicit "netdev->rx_queue" and "netdev->tx_queue" references. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:18:23 -07:00
David S. Miller	5ce2d488fe	pkt_sched: Remove 'dev' member of struct Qdisc. It can be obtained via the netdev_queue. So create a helper routine, qdisc_dev(), to make the transformations nicer looking. Now, qdisc_alloc() now no longer needs a net_device pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:06:30 -07:00
David S. Miller	bb949fbd18	netdev: Create netdev_queue abstraction. A netdev_queue is an entity managed by a qdisc. Currently there is one RX and one TX queue, and a netdev_queue merely contains a backpointer to the net_device. The Qdisc struct is augmented with a netdev_queue pointer as well. Eventually the 'dev' Qdisc member will go away and we will have the resulting hierarchy: net_device --> netdev_queue --> Qdisc Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:55:56 -07:00
WANG Cong	01e123d79a	pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive. Note, in the following patch, 'err' is initialized as: int err = -ENOBUFS; Signed-off-by: WANG Cong <wcong@critical-links.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-27 19:51:35 -07:00
Arjan van de Ven	b4192bbd85	net: Add a WARN_ON_ONCE() to the transmit timeout function WARN_ON_ONCE() gives a stack trace including the full module list. Having this in the kernel dump for the timeout case in the generic netdev watchdog will help us see quicker which driver is involved. It also allows us to collect statistics and patterns in terms of which drivers have this event occuring. Suggested by Andrew Morton Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 16:21:07 -07:00
Herbert Xu	2ba2506ca7	[NET]: Add preemption point in qdisc_run The qdisc_run loop is currently unbounded and runs entirely in a softirq. This is bad as it may create an unbounded softirq run. This patch fixes this by calling need_resched and breaking out if necessary. It also adds a break out if the jiffies value changes since that would indicate we've been transmitting for too long which starves other softirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:25:26 -07:00
Patrick McHardy	1e90474c37	[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API Convert packet schedulers to use the netlink API. Unfortunately a gradual conversion is not possible without breaking compilation in the middle or adding lots of casts, so this patch converts them all in one step. The patch has been mostly generated automatically with some minor edits to at least allow seperate conversion of classifiers and actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:10 -08:00
Patrick McHardy	62e3ba1b55	[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:07 -08:00
Eric Dumazet	9a429c4983	[NET]: Add some acquires/releases sparse annotations. Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:31 -08:00
Eric Dumazet	20fea08b5f	[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. Qdisc_class_ops are const, and Qdisc_ops are mostly read. Using "const" and "__read_mostly" qualifiers helps to reduce false sharing. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:58 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Peter P Waskiewicz Jr	5f1a485d59	[PKT_SCHED]: Check subqueue status before calling hard_start_xmit The only qdiscs that check subqueue state before dequeue'ing are PRIO and RR. The other qdiscs, including the default pfifo_fast qdisc, will allow traffic bound for subqueue 0 through to hard_start_xmit. The check for netif_queue_stopped() is done above in pkt_sched.h, so it is unnecessary for qdisc_restart(). However, if the underlying driver is multiqueue capable, and only sets queue states on subqueues, this will allow packets to enter the driver when it's currently unable to process packets, resulting in expensive requeues and driver entries. This patch re-adds the check for the subqueue status before calling hard_start_xmit, so we can try and avoid the driver entry when the queues are stopped. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 20:40:55 -08:00
Herbert Xu	ce0e32e65f	[NET]: Fix possible dev_deactivate race condition The function dev_deactivate is supposed to only return when all outstanding transmissions have completed. Unfortunately it is possible for store operations in the driver's transmit function to only become visible after dev_deactivate returns. This patch fixes this by taking the queue lock after we see the end of the queue run. This ensures that all effects of any previous transmit calls are visible. If however we detect that there is another queue run occuring, then we'll warn about it because this should never happen as we have pointed dev->qdisc to noop_qdisc within the same queue lock earlier in the functino. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 22:37:58 -07:00
Jeff Garzik	bfaae0f04c	[NET]: fix carrier-on bug? While looking at a net driver with the following construct, if (!netif_carrier_ok(dev)) netif_carrier_on(dev); it stuck me that the netif_carrier_ok() check was redundant, since netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway. This is the same reason why netif_queue_stopped() need not be called prior to netif_wake_queue(). This is true, but there is however an unwanted side effect from assuming that netif_carrier_on() can be called multiple times: it touches the watchdog, regardless of pre-existing carrier state. The fix: move watchdog-up inside the bit-cleared code path. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 23:26:43 -07:00
Jamal Hadi Salim	8236632fb3	[NET_SCHED]: explict hold dev tx lock For N cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of operation, after all the work of entering the dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. The changes in this patch result in a small increase in performance with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3 showed similar behavior; Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:15 -07:00
Stephen Hemminger	bea3348eef	[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device dev, int budget) to int foo_poll(struct napi_struct napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Patrick McHardy	0ba4805383	[NET_SCHED]: Remove unnecessary includes Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:41 -07:00
Patrick McHardy	876d48aabf	[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option The generic estimator is always built in anways and all the config options does is prevent including a minimal amount of code for setting it up. Additionally the option is already automatically selected for most cases. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:37 -07:00
Krishna Kumar	e50c41b53d	[NET]: qdisc_restart - couple of optimizations. Changes : - netif_queue_stopped need not be called inside qdisc_restart as it has been called already in qdisc_run() before the first skb is sent, and in __qdisc_run() after each intermediate skb is sent (note : we are the only sender, so the queue cannot get stopped while the tx lock was got in the ~LLTX case). - BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1 meant more packets are available, and __qdisc_run used to loop when qdisc_restart() returned -1. During those days, it was necessary to make sure that qlen is never less than zero, since __qdisc_run would get into an infinite loop if no packets are on the queue and this bug in qdisc was there (and worse - no more skbs could ever get queue'd as we hold the queue lock too). With Herbert's recent change to return values, this check is not required. Hopefully Herbert can validate this change. If at all this is required, it should be added to skb_dequeue (in failure case), and not to qdisc_qlen. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:36 -07:00
Krishna Kumar	6c1361a6f2	[NET]: qdisc_restart - readability changes plus one bug fix. New changes : - Incorporated Peter Waskiewicz's comments. - Re-added back one warning message (on driver returning wrong value). Previous changes : - Converted to use switch/case code which looks neater. - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless check should be removed, since driver will return NETDEV_TX_LOCKED only if lockless is true and driver has to do the locking. In the original code as well as the latest code, this code can result in a bug where if LLTX is not set for a driver (lockless == 0) but the driver is written wrongly to do a trylock (despite LLTX being set), the driver returns LOCKED. But since lockless is zero, the packet is requeue'd instead of calling collision code which will issue warning and free up the skb. Instead this skb will be retried with this driver next time, and the same result will ensue. Removing this check will catch these driver bugs instead of hiding the problem. I am keeping this change to readability section since : a. it is confusing to check two things as it is; and b. it is difficult to keep this check in the changed 'switch' code. - Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is the work being done and easier to understand) and do_dev_requeue to dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to dev_handle_collision (handle_dev_cpu_collision is a small routine with only one caller, so there is no need to have two separate routines which also results in getting rid of two macros, etc. - Removed an XXX comment as it should never fail (I suspect this was related to batch skb WIP, Jamal ?). Converted some functions to original coding style of having the return values and the function name on same line, eg prio2list. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:35 -07:00
Jamal Hadi Salim	c716a81ab9	[NET_SCHED]: Cleanup readability of qdisc restart Over the years this code has gotten hairier. Resulting in many long discussions over long summer days and patches that get it wrong. This patch helps tame that code so normal people will understand it. Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy for their valuable reviews. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:06:16 -07:00
Venkatesh Pallipadi	60468d5b5b	[NET]: Make net watchdog timers 1 sec jiffy aligned. round_jiffies for net dev watchdog timer. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:46 -07:00
Herbert Xu	36247f5421	[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Reported-and-debugged-by: Anant Nitya <kernel@prachanda.info> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:43 -07:00
Herbert Xu	41a23b0788	[NET_SCHED]: Avoid requeue warning on dev_deactivate When we relinquish queue_lock in qdisc_restart and then retake it for requeueing, we might race against dev_deactivate and end up requeueing onto noop_qdisc. This causes a warning to be printed. This patch fixes this by checking this before we requeue. As an added bonus, we can remove the same check in __qdisc_run which was added to prevent dev->gso_skb from being requeued when we're shutting down. Even though we've had to add a new conditional in its place, it's better because it only happens on requeues rather than every single time that qdisc_run is called. For this to work we also need to move the clearing of gso_skb up in dev_deactivate as now qdisc_restart can occur even after we wait for __LINK_STATE_QDISC_RUNNING to clear (but it won't do anything as long as the queue and gso_skb is already clear). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:42 -07:00
Herbert Xu	cce1fa36a8	[NET_SCHED]: Reread dev->qdisc for NETDEV_TX_OK Now that we return the queue length after NETDEV_TX_OK we better make sure that we have the right queue. Otherwise we can cause a stall after a really quick dev_deactive/dev_activate. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:41 -07:00
Herbert Xu	d90df3ad07	[NET_SCHED]: Rationalise return value of qdisc_restart The current return value scheme and associated comment was invented back in the 20th century when we still had that tbusy flag. Things have changed quite a bit since then (even Tony Blair is moving on now, not to mention the new French president). All we need to indicate now is whether the caller should continue processing the queue. Therefore it's sufficient if we return 0 if we want to stop and non-zero otherwise. This is based on a patch by Krishna Kumar. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:40 -07:00
Thomas Graf	5830725f8a	[NET]: Fix dev->qdisc race for NETDEV_TX_LOCKED case When transmit fails with NETDEV_TX_LOCKED the skb is requeued to dev->qdisc again. The dev->qdisc pointer is protected by the queue lock which needs to be dropped when attempting to transmit and acquired again before requeing. The problem is that qdisc_restart() fetches the dev->qdisc pointer once and stores it in the `q' variable which is invalidated when dropping the queue_lock, therefore the variable needs to be refreshed before requeueing. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:39 -07:00
Patrick McHardy	fd44de7cc1	[NET_SCHED]: ingress: switch back to using ingress_lock Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks both the ingress and egress qdiscs on the device. All changes to data that might be used on both ingress and egress needs to be protected by using qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally the qdisc stats_lock needs to be initialized to ingress_lock for ingress qdiscs. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:08 -07:00
Patrick McHardy	0463d4ae25	[NET_SCHED]: Eliminate qdisc_tree_lock Since we're now holding the rtnl during the entire dump operation, we can remove qdisc_tree_lock, whose only purpose is to protect dump callbacks from concurrent changes to the qdisc tree. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:07 -07:00
YOSHIFUJI Hideaki	10297b9931	[NET] SCHED: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:08 -08:00
Arjan van de Ven	f5a6e01c09	[NET]: user of the jiffies rounding code: Networking This patch introduces users of the round_jiffies() function in the networking code. These timers all were of the "about once a second" or "about once every X seconds" variety and several showed up in the "what wakes the cpu up" profiles that the tickless patches provide. Some timers are highly dynamic based on network load; but even on low activity systems they still show up so the rounding is done only in cases of low activity, allowing higher frequency timers in the high activity case. The various hardware watchdogs are an obvious case; they run every 2 seconds but aren't otherwise specific of exactly when they need to run. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:52 -08:00
Adrian Bunk	5f68e4c07c	[PKT_SCHED]: Remove unused exports. This patch removes the following unused EXPORT_SYMBOL's: - sch_api.c: qdisc_lookup - sch_generic.c: __netdev_watchdog_up - sch_generic.c: noop_qdisc_ops - sch_generic.c: qdisc_alloc Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:32:06 -08:00
Patrick McHardy	9f9afec482	[NET_SCHED]: Set parent classid in default qdiscs Set parent classids in default qdiscs to allow walking up the tree from outside the qdiscs. This is needed by the next patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:41 -08:00
Patrick McHardy	85670cc1fa	[NET_SCHED]: Fix fallout from dev->qdisc RCU change The move of qdisc destruction to a rcu callback broke locking in the entire qdisc layer by invalidating previously valid assumptions about the context in which changes to the qdisc tree occur. The two assumptions were: - since changes only happen in process context, read_lock doesn't need bottem half protection. Now invalid since destruction of inner qdiscs, classifiers, actions and estimators happens in the RCU callback unless they're manually deleted, resulting in dead-locks when read_lock in process context is interrupted by write_lock_bh in bottem half context. - since changes only happen under the RTNL, no additional locking is necessary for data not used during packet processing (f.e. u32_list). Again, since destruction now happens in the RCU callback, this assumption is not valid anymore, causing races while using this data, which can result in corruption or use-after-free. Instead of "fixing" this by disabling bottem halfs everywhere and adding new locks/refcounting, this patch makes these assumptions valid again by moving destruction back to process context. Since only the dev->qdisc pointer is protected by RCU, but ->enqueue and the qdisc tree are still protected by dev->qdisc_lock, destruction of the tree can be performed immediately and only the final free needs to happen in the rcu callback to make sure dev_queue_xmit doesn't access already freed memory. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:50 -07:00
Herbert Xu	d7811e623d	[NET]: Drop tx lock in dev_watchdog_up Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM. On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote: > > ======================================================= > [ INFO: possible circular locking dependency detected ] > ------------------------------------------------------- > swapper/0 is trying to acquire lock: > (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290 > > but task is already holding lock: > (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290 > > which lock already depends on the new lock. This turns out to be a genuine bug. The queue lock and xmit lock are intentionally taken out of order. Two things are supposed to prevent dead-locks from occuring: 1) When we hold the queue_lock we're supposed to only do try_lock on the tx_lock. 2) We always drop the queue_lock after taking the tx_lock and before doing anything else. > > the existing dependency chain (in reverse order) is: > > -> #1 (&dev->_xmit_lock){-+..}: > [<c012e7b6>] lock_acquire+0x76/0xa0 > [<c0336241>] _spin_lock_bh+0x31/0x40 > [<c02d25a9>] dev_activate+0x69/0x120 This path obviously breaks assumption 1) and therefore can lead to ABBA dead-locks. I've looked at the history and there seems to be no reason for the lock to be held at all in dev_watchdog_up. The lock appeared in day one and even there it was unnecessary. In fact, people added __dev_watchdog_up precisely in order to get around the tx lock there. The function dev_watchdog_up is already serialised by rtnl_lock since its only caller dev_activate is always called under it. So here is a simple patch to remove the tx lock from dev_watchdog_up. In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and replace it with dev_watchdog_up. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-18 00:22:30 -07:00
Panagiotis Issaris	0da974f4f3	[NET]: Conversions from kmalloc+memset to k(z\|c)alloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:51:30 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Herbert Xu	f6a78bfcb1	[NET]: Add generic segmentation offload This patch adds the infrastructure for generic segmentation offload. The idea is to tap into the potential savings of TSO without hardware support by postponing the allocation of segmented skb's until just before the entry point into the NIC driver. The same structure can be used to support software IPv6 TSO, as well as UFO and segmentation offload for other relevant protocols, e.g., DCCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 02:07:31 -07:00
Herbert Xu	d4828d85d1	[NET]: Prevent transmission after dev_deactivate The dev_deactivate function has bit-rotted since the introduction of lockless drivers. In particular, the spin_unlock_wait call at the end has no effect on the xmit routine of lockless drivers. With a little bit of work, we can make it much more useful by providing the guarantee that when it returns, no more calls to the xmit routine of the underlying driver will be made. The idea is simple. There are two entry points in to the xmit routine. The first comes from dev_queue_xmit. That one is easily stopped by using synchronize_rcu. This works because we set the qdisc to noop_qdisc before the synchronize_rcu call. That in turn causes all subsequent packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call also ensures all outstanding calls leave their critical section. The other entry point is from qdisc_run. Since we now have a bit that indicates whether it's running, all we have to do is to wait until the bit is off. I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is useless because netif_wake_queue can cause it to be set again. It is also harmless because we've disarmed qdisc_run. I've also removed the spin_unlock_wait on xmit_lock because its only purpose of making sure that all outstanding xmit_lock holders have exited is also given by dev_watchdog_down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 02:07:26 -07:00

1 2

65 Commits