linux-stable

Commit Graph

Author	SHA1	Message	Date
Adam Langley	4389dded77	tcp: Remove redundant checks when setting eff_sacks Remove redundant checks when setting eff_sacks and make the number of SACKs a compile time constant. Now that the options code knows how many SACK blocks can fit in the header, we don't need to have the SACK code guessing at it. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:07:02 -07:00
David S. Miller	4ae127d1b6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smc911x.c	2008-06-13 20:52:39 -07:00
David S. Miller	ec0a196626	tcp: Revert 'process defer accept as established' changes. This reverts two changesets, `ec3c0982a2` ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and the follow-on bug fix `9ae27e0adb` ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"). This change causes several problems, first reported by Ingo Molnar as a distcc-over-loopback regression where connections were getting stuck. Ilpo Järvinen first spotted the locking problems. The new function added by this code, tcp_defer_accept_check(), only has the child socket locked, yet it is modifying state of the parent listening socket. Fixing that is non-trivial at best, because we can't simply just grab the parent listening socket lock at this point, because it would create an ABBA deadlock. The normal ordering is parent listening socket --> child socket, but this code path would require the reverse lock ordering. Next is a problem noticed by Vitaliy Gusev, he noted: ---------------------------------------- >--- a/net/ipv4/tcp_timer.c >+++ b/net/ipv4/tcp_timer.c >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data) > goto death; > } > >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { >+ tcp_send_active_reset(sk, GFP_ATOMIC); >+ goto death; Here socket sk is not attached to listening socket's request queue. tcp_done() will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should release this sk) as socket is not DEAD. Therefore socket sk will be lost for freeing. ---------------------------------------- Finally, Alexey Kuznetsov argues that there might not even be any real value or advantage to these new semantics even if we fix all of the bugs: ---------------------------------------- Hiding from accept() sockets with only out-of-order data only is the only thing which is impossible with old approach. Is this really so valuable? My opinion: no, this is nothing but a new loophole to consume memory without control. ---------------------------------------- So revert this thing for now. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-12 16:34:35 -07:00
Ilpo Järvinen	b79eeeb9e4	tcp: Reorganize tcp_sock to fill 64-bit holes & improve locality I tried to group recovery related fields nearby (non-CA_Open related variables, to be more accurate) so that one to three cachelines would not be necessary in CA_Open. These are now contiguously deployed: struct sk_buff_head out_of_order_queue; /* 1968 80 / / --- cacheline 32 boundary (2048 bytes) --- / struct tcp_sack_block duplicate_sack[1]; / 2048 8 / struct tcp_sack_block selective_acks[4]; / 2056 32 / struct tcp_sack_block recv_sack_cache[4]; / 2088 32 / / --- cacheline 33 boundary (2112 bytes) was 8 bytes ago --- / struct sk_buff highest_sack; /* 2120 8 / int lost_cnt_hint; / 2128 4 / int retransmit_cnt_hint; / 2132 4 / u32 lost_retrans_low; / 2136 4 / u8 reordering; / 2140 1 / u8 keepalive_probes; / 2141 1 / / XXX 2 bytes hole, try to pack / u32 prior_ssthresh; / 2144 4 / u32 high_seq; / 2148 4 / u32 retrans_stamp; / 2152 4 / u32 undo_marker; / 2156 4 / int undo_retrans; / 2160 4 / u32 total_retrans; / 2164 4 */ ...and they're then followed by URG slowpath & keepalive related variables. Head of the out_of_order_queue always needed for empty checks, if that's empty (and TCP is in CA_Open), following ~200 bytes (in 64-bit) shouldn't be necessary for anything. If only OFO queue exists but TCP is in CA_Open, selective_acks (and possibly duplicate_sack) are necessary besides the out_of_order_queue but the rest of the block again shouldn't be (ie., the other direction had losses). As the cacheline boundaries depend on many factors in the preceeding stuff, trying to align considering them doesn't make too much sense. Commented one ordering hazard. There are number of low utilized u8/16s that could be combined get 2 bytes less in total so that the hole could be made to vanish (includes at least ecn_flags, urg_data, urg_mode, frto_counter, nonagle). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-29 03:25:23 -07:00
Ilpo Järvinen	4b74944044	tcp: Make prior_ssthresh a u32 If previous window was above representable values of u16, strange things will happen if undo with the truncated value is called for. Alternatively, this could be fixed by some max trickery but that would limit undoing high-speed undos. Adds 16-bit hole but there isn't anything to fill it with. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 17:40:05 -07:00
Patrick McManus	ec3c0982a2	[TCP]: TCP_DEFER_ACCEPT updates - process as established Change TCP_DEFER_ACCEPT implementation so that it transitions a connection to ESTABLISHED after handshake is complete instead of leaving it in SYN-RECV until some data arrvies. Place connection in accept queue when first data packet arrives from slow path. Benefits: - established connection is now reset if it never makes it to the accept queue - diagnostic state of established matches with the packet traces showing completed handshake - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be enforced with reasonable accuracy instead of rounding up to next exponential back-off of syn-ack retry. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:33:01 -07:00
Ilpo Järvinen	68f8353b48	[TCP]: Rewrite SACK block processing & sack_recv_cache use Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Currently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively "special cased" fastpath is dropped. This change adds some complexity to introduce better coveraged "fastpath", though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath "special case". DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	fd6dad616d	[TCP]: Earlier SACK block verification & simplify access to them Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	a47e5a988a	[TCP]: Convert highest_sack to sk_buff to allow direct access It is going to replace the sack fastpath hint quite soon... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	f78a1b3892	[TCP]: Make snd_cwnd_cnt 32-bit Very little point of having 32-bit snd_cnwd if this is not 32-bit as well, as a number of snd_cwnd incrementation formulas assume that snd_cwnd_cnt can be at least as large as snd_cwnd. Whether 32-bit is useful was discussed when `e0ef57cc56` was made: http://marc.info/?l=linux-netdev&m=117218144409825&w=2 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:59:43 -07:00
Ilpo Järvinen	b08d6cb22c	[TCP]: Limit processing lost_retrans loop to work-to-do cases This addition of lost_retrans_low to tcp_sock might be unnecessary, it's not clear how often lost_retrans worker is executed when there wasn't work to do. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-11 17:36:13 -07:00
Ilpo Järvinen	c79e335716	[TCP]: Comment fastpath_cnt_hint off-by-one trap Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:47 -07:00
Ilpo Järvinen	13dae42631	[TCP]: Update comment about highest_sack validity This stale info came from the original idea, which proved to be unnecessarily complex, sacked_out > 0 is easy to do and that when it's going to be needed anyway (it _can_ be valid also when sacked_out == 0 but there's not going to be a guarantee about it for now). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:00 -07:00
Ilpo Järvinen	b5860bbac7	[TCP]: Tighten tcp_sock's belt, drop left_out It is easily calculable when needed and user are not that many after all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:55 -07:00
Ilpo Järvinen	539d243fdd	[TCP]: Access to highest_sack obsoletes forward_cnt_hint In addition, added a reference about the purpose of the loop. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:53 -07:00
Ilpo Järvinen	d738cd8fca	[TCP]: Add highest_sack seqno, points to globally highest SACK It is guaranteed to be valid only when !tp->sacked_out. In most cases this seqno is available in the last ACK but there is no guarantee for that. The new fast recovery loss marking algorithm needs this as entry point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:50 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	ab6a5bb6b2	[TCP]: Introduce tcp_hdrlen() and tcp_optlen() The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to avoid the longer, open coded equivalent. Ditched a no-op in bnx2 in the process. I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:24 -07:00
David S. Miller	e0ef57cc56	[TCP]: Make snd_cwnd_clamp a u32. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:21 -07:00
Eric Dumazet	54287cc178	[TCP]: Keep copied_seq, rcv_wup and rcv_next together. I noticed in oprofile study a cache miss in tcp_rcv_established() to read copied_seq. ffffffff80400a80 <tcp_rcv_established>: /* tcp_rcv_established total: 4034293 2.0400 */ 55493 0.0281 :ffffffff80400bc9: mov 0x4c8(%r12),%eax copied_seq 543103 0.2746 :ffffffff80400bd1: cmp 0x3e0(%r12),%eax rcv_nxt if (tp->copied_seq == tp->rcv_nxt && len - tcp_header_len <= tp->ucopy.len) { In this function, the cache line 0x4c0 -> 0x500 is used only for this reading 'copied_seq' field. rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...) As you suggested, I changed tcp_create_openreq_child() so that these fields are changed together, to avoid adding a new store buffer stall. Patch is 64bit friendly (no new hole because of alignment constraints) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:21 -07:00
Baruch Even	6f74651ae6	[TCP]: Seperate DSACK from SACK fast path Move DSACK code outside the SACK fast-path checking code. If the DSACK determined that the information was too old we stayed with a partial cache copied. Most likely this matters very little since the next packet will not be DSACK and we will find it in the cache. but it's still not good form and there is little reason to couple the two checks. Since the SACK receive cache doesn't need the data to be in host order we also remove the ntohl in the checking loop. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:49 -08:00
Arnaldo Carvalho de Melo	3a137d2065	[TCP]: Renove the __ prefix on the struct tcp_sock members As this struct is not userland visible at all. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:54 -08:00
Arnaldo Carvalho de Melo	2ff52f282c	[TCP]: Change tcp_header_len member in tcp_sock to u16 With this we eliminate the last hole in struct tcp_sock. End result: [acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c: struct tcp_sock \| -4 tcp_header_len; from: int /* 1000(0) 4(0) / to: u16 / 1000(0) 2(0) */ 1 struct changed [acme@newtoy net-2.6.20]$ Now sizeof(tcp_sock) is just... [acme@newtoy net-2.6.20]$ pahole --sizes ../OUTPUT/qemu/net-2.6.20/net/ipv4/tcp.o \| grep -w tcp_sock struct tcp_sock: 1500 0 1500 bytes ;-) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:53 -08:00
Al Viro	9981a0e36a	[NET]: Annotate checksums in on-the-wire packets. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:26 -08:00
YOSHIFUJI Hideaki	cfb6eeb4c8	[TCP]: MD5 Signature Option (RFC2385) support. Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:39 -08:00
John Heffner	ae8064ac32	[TCP]: Bound TSO defer time This patch limits the amount of time you will defer sending a TSO segment to less than two clock ticks, or the time between two acks, whichever is longer. On slow links, deferring causes significant bursts. See attached plots, which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness causes significant jitter, tends to overflow queues early (bad for short queues), and makes delay-based congestion control more difficult. Deferring by a couple clock ticks I believe will have a relatively small impact on performance. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 20:36:48 -07:00
Al Viro	dddc93c05d	[TCP]: struct tcp_sock .pred_flags is net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:05 -07:00
Al Viro	269bd27e66	[TCP]: struct tcp_sack_block annotations Some of the instances of tcp_sack_block are host-endian, some - net-endian. Define struct tcp_sack_block_wire identical to struct tcp_sack_block with u32 replaced with __be32; annotate uses of tcp_sack_block replacing net-endian ones with tcp_sack_block_wire. Change is obviously safe since for cc(1) __be32 is typedefed to u32. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:04 -07:00
Al Viro	46a97324a5	[IPV4]: TCP headers annotated Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:03 -07:00
David Woodhouse	c8a553ad7f	[TCP]: Move inclusion of <linux/dmaengine.h> to correct place in <linux/tcp.h> The new <linux/dmaengine.h> header shouldn't be included from the !__KERNEL__ portion of tcp.h Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 02:07:40 -07:00
Linus Torvalds	cee4cca740	Merge git://git.infradead.org/hdrcleanup-2.6 * git://git.infradead.org/hdrcleanup-2.6: (63 commits) [S390] __FD_foo definitions. Switch to __s32 types in joystick.h instead of C99 types for consistency. Add <sys/types.h> to headers included for userspace in <linux/input.h> Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h Remove struct fddi_statistics from user view in <linux/if_fddi.h> Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390 Revert include/media changes: Mauro says those ioctls are only used in-kernel(!) Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h> Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too Remove private struct dx_hash_info from public view in <linux/ext3_fs.h> Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h> Use __uXX types in <linux/divert.h> for struct divert_blk et al. Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible. Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely Use __uXX types in user-visible structures in <linux/nbd.h> Don't use 'u32' in user-visible struct ip_conntrack_old_tuple. Use __uXX types for S390 DASD volume label definitions which are user-visible S390 BIODASDREADCMB ioctl should use __u64 not u64 type. Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h> Fix private integer types used in V4L2 ioctls. ... Manually resolve conflict in include/linux/mtd/physmap.h	2006-06-20 15:10:08 -07:00
Chris Leech	97fc2f0848	[I/OAT]: Structure changes for TCP recv offload to I/OAT Adds an async_wait_queue and some additional fields to tcp_sock, and a dma_cookie_t to sk_buff. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:25:48 -07:00
David Woodhouse	62c4f0a2d5	Don't include linux/config.h from anywhere else in include/ Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2006-04-26 12:56:16 +01:00
John Heffner	0e7b13685f	[TCP] mtu probing: move tcp-specific data out of inet_connection_sock This moves some TCP-specific MTU probing state out of inet_connection_sock back to tcp_sock. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 21:32:58 -08:00
Arnaldo Carvalho de Melo	d83d8461f9	[IP_SOCKGLUE]: Remove most of the tcp specific calls As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:58 -08:00
Arnaldo Carvalho de Melo	2271281362	[TCP]: Move the TCPF_ enum to tcp_states.h Upcoming patches will make, for instance, ip_sockglue.c need just this enum and not all of tcp.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:57 -08:00
Arnaldo Carvalho de Melo	8292a17a39	[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops And move it to struct inet_connection_sock. DCCP will use it in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:38 -08:00
Stephen Hemminger	6a438bbe68	[TCP]: speed up SACK processing Use "hints" to speed up the SACK processing. Various forms of this have been used by TCP developers (Web100, STCP, BIC) to avoid the 2x linear search of outstanding segments. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:14:59 -08:00
Stephen Hemminger	9772efb970	[TCP]: Appropriate Byte Count support This is an updated version of the RFC3465 ABC patch originally for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting bytes ack'd rather than packets when updating congestion control. The orignal ABC described in the RFC applied to a Reno style algorithm. For advanced congestion control there is little change after leaving slow start. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:09:53 -08:00
Arnaldo Carvalho de Melo	6687e988d9	[ICSK]: Move TCP congestion avoidance members to icsk This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(), minimal renaming/moving done in this changeset to ease review. Most of it is just changes of struct tcp_sock * to struct sock * parameters. With this we move to a state closer to two interesting goals: 1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used for any INET transport protocol that has struct inet_hashinfo and are derived from struct inet_connection_sock. Keeps the userspace API, that will just not display DCCP sockets, while newer versions of tools can support DCCP. 2. INET generic transport pluggable Congestion Avoidance infrastructure, using the current TCP CA infrastructure with DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:56:18 -07:00
Arnaldo Carvalho de Melo	295f7324ff	[ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timer With this we're very close to getting all of the current TCP refactorings in my dccp-2.6 tree merged, next changeset will export some functions needed by the current DCCP code and then dccp-2.6.git will be born! Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:49:29 -07:00
Arnaldo Carvalho de Melo	463c84b97f	[NET]: Introduce inet_connection_sock This creates struct inet_connection_sock, moving members out of struct tcp_sock that are shareable with other INET connection oriented protocols, such as DCCP, that in my private tree already uses most of these members. The functions that operate on these members were renamed, using a inet_csk_ prefix while not being moved yet to a new file, so as to ease the review of these changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:43:19 -07:00
Arnaldo Carvalho de Melo	8feaf0c0a5	[INET]: Generalise tcp_tw_bucket, aka TIME_WAIT sockets This paves the way to generalise the rest of the sock ID lookup routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro kernels (where IPv6 is always built as a module): [root@qemu ~]# grep tw_sock /proc/slabinfo tw_sock_TCPv6 0 0 128 31 1 tw_sock_TCP 0 0 96 41 1 [root@qemu ~]# Now if a protocol wants to use the TIME_WAIT generic infrastructure it only has to set the sk_prot->twsk_obj_size field with the size of its inet_timewait_sock derived sock and proto_register will create sk_prot->twsk_slab, for now its only for INET sockets, but we can introduce timewait_sock later if some non INET transport protocolo wants to use this stuff. Next changesets will take advantage of this new infrastructure to generalise even more TCP code. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o [acme@toy net-2.6.14]$ Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1 (qemu host)). Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:42:13 -07:00
Arnaldo Carvalho de Melo	c752f0739f	[TCP]: Move the tcp sock states to net/tcp_states.h Lots of places just needs the states, not even linux/tcp.h, where this enum was, needs it. This speeds up development of the refactorings as less sources are rebuilt when things get moved from net/tcp.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:41:54 -07:00
Arnaldo Carvalho de Melo	a55ebcc4c4	[INET]: Move bind_hash from tcp_sk to inet_sk This should really be in a inet_connection_sock, but I'm leaving it for a later optimization, when some more fields common to INET transport protocols now in tcp_sk or inet_sk will be chunked out into inet_connection_sock, for now its better to concentrate on getting the changes in the core merged to leave the DCCP tree with only DCCP specific code. Next changesets will take advantage of this move to generalise things like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in the future, when tcp_tw_bucket gets transformed into the struct timewait_sock hierarchy. tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets moved to sk_prot. A cascade of incremental changes will ultimately make the tcp_lookup functions be fully generic. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:38:48 -07:00
Arnaldo Carvalho de Melo	0f7ff9274e	[INET]: Just rename the TCP hashtable functions/structs to inet_ This is to break down the complexity of the series of patches, making it very clear that this one just does: 1. renames tcp_ prefixed hashtable functions and data structures that were already mostly generic to inet_ to share it with DCCP and other INET transport protocols. 2. Removes not used functions (__tb_head & tb_head) 3. Removes some leftover prototypes in the headers (tcp_bucket_unlock & tcp_v4_build_header) Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port, __tcp_put_port, generic and get others like tcp_destroy_sock closer to generic (tcp_orphan_count will go to sk->sk_prot to allow this). Eventually most of these functions will be used passing the transport protocol inet_hashinfo structure. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:38:32 -07:00
David S. Miller	c1b4a7e695	[TCP]: Move to new TSO segmenting scheme. Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:24:38 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00

1 2

54 Commits