Commit Graph

6637 Commits

Author SHA1 Message Date
Pavel Emelyanov df97c708d5 [NET]: Eliminate unused argument from sk_stream_alloc_pskb
The 3rd argument is always zero (according to grep :) Eliminate
it and merge the function with sk_stream_alloc_skb.

This saves 44 more bytes, and together with the previous patch
we have:

add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568)
function                                     old     new   delta
sk_stream_alloc_skb                            -     183    +183
ip_rt_init                                   529     525      -4
arp_ignore                                   112     107      -5
__inet_lookup_listener                       284     274     -10
tcp_sendmsg                                 2583    2481    -102
tcp_sendpage                                1449    1300    -149
tso_fragment                                 417     258    -159
tcp_fragment                                1149     988    -161
__tcp_push_pending_frames                   1998    1837    -161

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:08 -08:00
Pavel Emelyanov f561d0f27d [NET]: Uninline the sk_stream_alloc_pskb
This function seems too big for inlining. Indeed, it saves
half-a-kilo when uninlined:

add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524)
function                                     old     new   delta
sk_stream_alloc_pskb                           -     195    +195
ip_rt_init                                   529     525      -4
__inet_lookup_listener                       284     274     -10
tcp_sendmsg                                 2583    2486     -97
tcp_sendpage                                1449    1305    -144
tso_fragment                                 417     267    -150
tcp_fragment                                1149     992    -157
__tcp_push_pending_frames                   1998    1841    -157

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:07 -08:00
Joonwoo Park 3015a347dc [IPV4] fib_hash: kmalloc + memset conversion to kzalloc
fib_hash: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:07 -08:00
Joonwoo Park 88f8349164 [IPV4] fib_semantics: kmalloc + memset conversion to kzalloc
fib_semantics: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:06 -08:00
Joonwoo Park dcaee95a1b [IPSEC]: kmalloc + memset conversion to kzalloc
2007/11/26, Patrick McHardy <kaber@trash.net>:
> How about also switching vmalloc/get_free_pages to GFP_ZERO
> and getting rid of the memset entirely while you're at it?
>

xfrm_hash: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:05 -08:00
Ilpo Järvinen 8512430e55 [TCP]: Move FRTO checks out from write queue abstraction funcs
Better place exists in update_send_head (other non-queue related
adjustments are done there as well) which is the only caller of
tcp_advance_send_head (now that the bogus call from mtu_probe is
gone).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:05 -08:00
Pavel Emelyanov 82d8a867ff [NET]: Make macro to specify the ptype_base size
Currently this size is 16, but as the comment says this
is so only because all the chains (except one) has the
length 1. I think, that some day this may change, so
growing this hash will be much easier.

Besides, symbolic names are read better than magic constants.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:04 -08:00
Pavel Emelyanov 8d8ad9d7c4 [NET]: Name magic constants in sock_wake_async()
The sock_wake_async() performs a bit different actions
depending on "how" argument. Unfortunately this argument
ony has numerical magic values.

I propose to give names to their constants to help people
reading this function callers understand what's going on
without looking into this function all the time.

I suppose this is 2.6.25 material, but if it's not (or the
naming seems poor/bad/awful), I can rework it against the
current net-2.6 tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:03 -08:00
Gerrit Renker ce865a61c8 [DCCP]: Add support for abortive release
This continues from the previous patch and adds support for actively aborting
a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an
abortive release.

I have tried this in various client/server settings and it works as expected.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:02 -08:00
Gerrit Renker d83bd95bf1 [DCCP]: Check for unread data on close
This removes one FIXME with regard to close when there is still unread data.
The mechanism is implemented similar to TCP: with regard to DCCP-specifics,
a Reset with Code 2, "Aborted" is sent to the peer.

This corresponds in part to RFC 4340, 8.1.1 and 8.1.5.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:01 -08:00
Gerrit Renker dcfbc7e97a [CCID2]: Remove misleading comment
This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none.
The comment assumes it is possible that a packet is sent between the calls to

	ccid_hc_tx_send_packet(),
	dccp_transmit_skb(),
	ccid_hc_tx_packet_sent()

(in the above order) in dccp_write_xmit().

I think that this is impossible, since dccp_write_xmit() is always called under lock:

 * when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked
   (see code comment above dccp_send_close());
 * when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called;
 * when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called
   and the if/else statement has made sure that sk_lock.owner is not set;
 * there are no other places where dccp_write_xmit() is called.

Furthermore, the debug statement for printing the sequence number of the packet just sent has been
removed, since the entire list is being printed anyway and so the entry of that number appears last.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:01 -08:00
Gerrit Renker a302002516 [CCID2]: Remove redundant ack-counting variable
The code used two different variables to count Acks, one of them redundant.
This patch reduces the number of Ack counters to one.

The type of the Ack counter has also been changed to u32 (twice the range of int);
and the variable has been renamed into `packets_acked' - for consistency with
RFC 3465 (and similarly named variables are used by TCP and SCTP).

Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios,
maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:00 -08:00
Gerrit Renker 83399361c3 [CCID2]: Remove redundant synchronisation variable
This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1
when the CCID2 sender may send a new packet, and which is set to 0 otherwise

The variable is redundant, since it is only used in combination with the hc_tx_send_packet/
hc_tx_packet_sent function pair. Both functions are called under socket lock, so the
following happens when the CCID2 may send a new packet:

 * it sets sendwait = 1 in tx_send_packet and returns 0;
 * the subsequent call to tx_packet_sent clears the sendwait flag;
 * since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition
   in tx_packet_sent is never satisfied, since that function is never called when
   tx_send_packet returns a value different from 0 (cf. dccp_write_xmit);
 * the call to tx_packet_sent clears the flag so that the condition "!sendwait" is
   true the next time tx_packet_sent is called.

In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet
and tx_packet_sent -- which is what the patch does.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:59 -08:00
Gerrit Renker da98e0b5d4 [CCID2]: Redundant debugging output
This reduces the amount of redundant debugging messages:

 * pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent().
   Both functions are called immediately after one another, so one occurrence is sufficient.

 * Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant.

 * In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:59 -08:00
Gerrit Renker 95b21d7e9d [CCID2]: Replace pipe assignment-function with assignment
The function ccid2_change_pipe only does an assignment. This patch simplifies the code by
replacing the function with the assignment it performs.

Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range).
As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed,
but replaced with a less annoying `DCCP_BUG').

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:58 -08:00
Gerrit Renker 3deeadd74b [CCID2]: Replace cwnd assignment-function with assignment
The current function ccid2_change_cwnd in effect makes only an assignment, as
the test whether cwnd has reached 0 is only required when cwnd is halved.

This patch simplifies the code by replacing the function with the assignment
it performs.

Furthermore, since ssthresh derives from cwnd and appears in many assignments and
comparisons, the type of ssthresh has also been changed to match that of cwnd.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:57 -08:00
Gerrit Renker 63df18ad7f [CCID2]: Replace read-only variable with constant
This replaces the field member `numdupack', which was used as a read-only
constant in the code, with a #define.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:57 -08:00
Gerrit Renker 7792cd8885 [CCID2]: Remove unused variable
This removes a variable `ccid2hctx_sent' which is incremented but
never referenced/read (i.e., dead code).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:56 -08:00
Gerrit Renker 900bfed471 [CCID2]: Disable broken Ack Ratio adaptation algorithm
This comments out a problematic section comprising a half-finished algorithm:

 - The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and
   hence in fact is a read-only constant.
 - The `arsent' variable counts packets other than Acks (it is incremented for every packet),
   and there is no test for Ack Loss.
 - The concept of counting Acks as such leads to a complex calculation, and the calculation
   at the moment is inconsistent with this concept.
   The problem is that the number of Acks - rather than the number of windows - is counted,
   which leads to a complex (cubic/quadratic) expression - this is not even implemented.

In its current state, the commented-out algorithm interfers with normal processing by
changing Ack Ratio incorrectly, and at the wrong times.

A new algorithm is necessary, which will not necessarily use the same variables as used by
the unfinished one; hence the old variables have been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:55 -08:00
Gerrit Renker b00d2bbc45 [CCID2]: Larger initial windows also for CCID2
RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most
four packets for new connections, following the rules from [RFC3390]", which
is implemented by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:55 -08:00
Arnaldo Carvalho de Melo e18d7a9857 [DCCP]: Initialize dccp_sock before calling the ccid constructors
This is because in the next patch CCID2 will assume that dccps_mss_cache is
non-zero.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:54 -08:00
Gerrit Renker d50ad163e6 [CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd
This patch removes a bug in the current code. I agree with Andrea's comment
that there is a problem here but the way it is treated does not fix it.

The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs:
 * the receiver will not send an Ack until (Ack Ratio - cwnd) data packets
   have arrived;
 * the sender will not send any data packet before the receipt of an Ack
   advances the send window.
The only way that the connection then progresses was via RTO timeout. In one
extreme case (bulk transfer), it was observed that this happened for every single
packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart:
a transfer which normally would take a fraction of a second thus grew to
several minutes.

The solution taken by this approach is to observe the relation

                   "Ack Ratio <= cwnd"

by using the constraint (1) from RFC 4341, 6.1.2; i.e. set

                 Ack Ratio = ceil(cwnd / 2)

and update it whenever either Ack Ratio or cwnd change. This ensures that
the deadlock problem can not arise.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:53 -08:00
Gerrit Renker df054e1d00 [CCID2]: Don't assign negative values to Ack Ratio
Since it makes not sense to assign negative values to Ack Ratio, this
patch disallows this possibility.

As a consequence, a Bug test for negative Ack Ratio values becomes obsolete.

Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes,
due to RFC 4340, 11.3) has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:53 -08:00
Gerrit Renker cfbbeabc88 [CCID2]: Fix sequence number arithmetic/comparisons
This replaces use of normal subtraction with modulo-48 subtraction.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:52 -08:00
Gerrit Renker 3de5489f47 [CCID2]: Bug in reading Ack Vectors
In CCID2 the receiver-history is sorted in ascending order of sequence number,
but the processing of received Ack Vectors requires the list traversal in the
opposite direction.

The current code has a bug in this regard: the list traversal is upwards. As a
consequence, only Ack Vectors with a run length of 1 will pass, in all other
Ack Vectors the remaining (acked) sequence numbers are missed, and may later
falsely be identified as lost.

Note: This bug is only visible when Ack Ratio > 1, since otherwise the run
      lengths of Ack Vectors are 0.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:51 -08:00
Gerrit Renker a47c51044a [ACKVEC]: Reduce length of identifiers
This is reduces the length of the struct ackvec/ackvec_record fields. It is
a purely text-based replacement:

	s#dccpavr_#avr_#g;
	s#dccpav_#av_#g;

and increases readability somewhat.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:51 -08:00
Pavel Emelyanov f126734735 [IPV6]: Correct the comment concerning inetsw6 table
It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c,
but one comment was not fixed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:49 -08:00
Pavel Emelyanov a53eb3feb2 [UNIX] Move the unix sock iterators in to proper place
The first_unix_socket() and next_unix_sockets() are now used
in proc file and in forall_unix_socets macro only.

The forall_unix_sockets is not used in this file at all so
remove it. After this move the helpers to where they really
belong, i.e. closer to proc code under the #ifdef CONFIG_PROC_FS
option.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:49 -08:00
Gerrit Renker c86ab2b6a5 [DCCP]: Ignore Ack Vectors / Elapsed Time on DCCP-Request also
Small update with regard to RFC 4340 (references added as documentation):
on Requests, Ack Vectors / Elapsed Time should be ignored.
Length handling of Elapsed Time also simplified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:47 -08:00
Gerrit Renker 6d57b43bf8 [DCCP]: Remove redundant dependency on IP_DCCP
This cleans up the consequences of an earlier patch which
introduced the `if IP_DCCP' clause into net/dccp/Kconfig.

The CCID Kconfig menu is sourced within this clause; as a
consequence, all tests of type `depends on IP_DCCP' are now
redundant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:46 -08:00
Gerrit Renker e333b3edc4 [DCCP]: Promote CCID2 as default CCID
This patch addresses the following problems:

 1. DCCP relies for its proper functioning on having at least one CCID module
    enabled (as in TCP plugable congestion control). Currently it is possible to
    disable both CCIDs and thus leave the DCCP module in a compiled, but entirely
    non-functional state: no sockets can be created when no CCID is available.
    Furthermore, the protocol is (again like TCP) not intended to be used without
    CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation.

 2. Internally the default CCID that is advertised by the Linux host is set to CCID2
    (DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig
    menu without changing the defaults leads to a failure `module not found' when
    trying to load the dccp module (which internally tries to load the default CCID).

 3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a
    `minimum common denominator'; the specification says that:

    * "New connections start with CCID 2 for both endpoints"

    * "A DCCP implementation intended for general use, such as an implementation in a
       general-purpose operating system kernel, SHOULD implement at least CCID 2.
       The intent is to make CCID 2 broadly available for interoperability [...]"

    Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable.

Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added.

Discussions with Ian McDonald on this subject are gratefully acknowledged.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:46 -08:00
Gerrit Renker 8e8c71f1ab [DCCP]: Honour and make use of shutdown option set by user
This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.

This patch exploits the information provided by the user via the socket API to reduce
processing costs:
 * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
 * if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
   as long as the TX queue has not been drained, we need to receive feedback to keep
   congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
   packet (under congestion control) has been sent;
 * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
   as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).

Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
to this by the present patch.

There will also no longer be any delivery when the socket is in the final stages, i.e. when
one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
that stage the connection is its final stages.

Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown

A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).

Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
      sure the socket is a TCP socket". This should probably be extended to mean
      `TCP or DCCP socket' (the code is also used by UDP and raw sockets).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:44 -08:00
Gerrit Renker c3ada46a00 [CCID3]: Inline for moving average
The moving average computation occurs so frequently in the CCID 3 code that
it merits an inline function  of its own. This is uses a suggestion by
Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:43 -08:00
Gerrit Renker a5358fdc9c [CCID3]: Accurately determine idle & application-limited periods
This fixes/updates the handling of idle and application-limited periods in CCID3,
which currently is broken: there is no detection as to how long a sender has been
idle - there is only one flag which is toggled in between function calls.

Being obsolete now, the `idle' flag is removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:42 -08:00
Gerrit Renker eb279b79c4 [CCID3]: Ignore trivial amounts of elapsed time
This patch fixes a previously undiscovered bug; the problem is in computing
the elapsed time as the time between `receiving' the packet (i.e. skb enters
CCID module) and sending feedback:

     - there is no layer-processing, queueing, or delay involved,
     - hence the elapsed time is in the order of 1 function call
     - this is in the dimension of maximally 50..100usec
     - which renders the use of elapsed time almost entirely useless.

The fix is simply to ignore such trivial amounts of elapsed time.

As a further advantage, the now useless elapsed_time field can be removed from
the socket, which reduces the socket structure by another four bytes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:42 -08:00
Gerrit Renker 6c08b2cf48 [CCID3]: Revert use of MSS instead of s
This updates the CCID3 code with regard to two instances of using `MSS' in place of `s':

 1. The RFC3390-based initial rate: both rfc3448bis as well as the Faster Restart
    draft now consistently use `s' instead of MSS.

 2. Now agrees with section 4.2 of rfc3448bis: "If the sender is ready to send data when
    it does not yet have a round trip sample, the value of X is set to s bytes per
    second, for segment size s [...]"

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:41 -08:00
Arnaldo Carvalho de Melo ebb53d7565 [NET] proto: Use pcounters for the inuse field
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:40 -08:00
Johannes Berg 0c884439db mac80211: remove more forgotten code
Hopefully that's the rest. Seems I didn't do a very thorough job
removing the management interface.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:39 -08:00
Helmut Schaa 48933dea47 mac80211: Remove local->scan_flags
This patch removes all references to local->scan_flags as these are not
used anymore since the removal of prism2 ioctls.

Signed-off-by: Helmut Schaa <hschaa@suse.de>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:37 -08:00
Johannes Berg dabeb344f5 mac80211: provide interface iterator for drivers
Sometimes drivers need to know which interfaces are associated with
their hardware. Rather than forcing those drivers to keep track of
the interfaces that were added, this adds an iteration function to
mac80211.

As it is intended to be used from the interface add/remove callbacks,
the iteration function may currently only be called under RTNL.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:37 -08:00
Pavel Emelyanov 9859a79023 [NET]: Compact sk_stream_mem_schedule() code
This function references sk->sk_prot->xxx for many times.
It turned out, that there's so many code in it, that gcc
cannot always optimize access to sk->sk_prot's fields.

After saving the sk->sk_prot on the stack and comparing
disassembled code, it turned out that the function became
~10 bytes shorter and made less dereferences (on i386 and
x86_64). Stack consumption didn't grow.

Besides, this patch drives most of this function into the
80 columns limit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:36 -08:00
Benjamin Thery 3ef1355dcb [NET]: Make netns cleanup to run in a separate queue
This patch adds a separate workqueue for cleaning up a network
namespace. If we use the keventd workqueue to execute cleanup_net(),
there is a problem to unregister devices in IPv6. Indeed the code
that cleans up also schedule work in keventd: as long as cleanup_net()
hasn't return, dst_gc_task() cannot run and as long as dst_gc_task() has
not run, there are still some references pending on the net devices and
cleanup_net() can not unregister and exit the keventd workqueue.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Acked-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:35 -08:00
Pavel Emelyanov 85b606800b [IPVS]: Relax the module get/put in ip_vs_app.c
Both try_module_get/module_put already handle the module == NULL
case, so no need in manual checking.

This patch fits both net-2.6 and net-2.6.25.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:35 -08:00
Adrian Bunk 02d45827fa [NET] net/core/request_sock.c: Remove unused exports.
This patch removes the following unused EXPORT_SYMBOL's:
- reqsk_queue_alloc
- __reqsk_queue_destroy
- reqsk_queue_destroy

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:33 -08:00
Eric Dumazet beb659bd8c [PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue
Every 600 seconds (ip_rt_secret_interval), a softirq flush of the
whole ip route cache is triggered. On loaded machines, this can starve
softirq for many seconds and can eventually crash.

This patch moves this flush to a workqueue context, using the worker
we intoduced in commit 39c90ece75 (IPV4:
Convert rt_check_expire() from softirq processing to workqueue.)

Also, immediate flushes (echo 0 >/proc/sys/net/ipv4/route/flush) are
using rt_do_flush() helper function, wich take attention to
rescheduling.

Next step will be to handle delayed flushes
("echo -1 >/proc/sys/net/ipv4/route/flush" or "ip route flush cache")

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:33 -08:00
Pavel Emelyanov 42a73808ed [RAW]: Consolidate proc interface.
Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.

The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)

Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.

This removes most of the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:32 -08:00
Pavel Emelyanov ab70768ec7 [RAW]: Consolidate proto->unhash callback
Same as the ->hash one, this is easily consolidated.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov 65b4c50b47 [RAW]: Consolidate proto->hash callback
Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov b673e4dfc8 [RAW]: Introduce raw_hashinfo structure
The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.

Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:30 -08:00
Pavel Emelyanov 69d6da0b0f [IPv6] RAW: Compact the API for the kernel
Same as in the previous patch for ipv4, compact the
API and hide hash table and rwlock inside the raw.c
file.

Plus fix some "bad" places from checkpatch.pl point
of view (assignments inside if()).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:29 -08:00