linux-stable/net/ipv6
Eric Dumazet 271b72c7fa udp: RCU handling for Unicast packets.
Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
 writes should happen in the fast path.

 Note: Multicasts and broadcasts still will need to take a lock,
 because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 02:11:14 -07:00
..
netfilter net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
addrconf.c net: replace all current users of NIP6_SEQFMT with %#p6 2008-10-28 16:05:40 -07:00
addrconf_core.c [IPV6]: ipv6_addr_type() doesn't know about RFC4193 addresses. 2007-07-31 02:28:21 -07:00
addrlabel.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
af_inet6.c netns: mib6 section fixlet 2008-10-13 18:54:07 -07:00
ah6.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
anycast.c net: replace all current users of NIP6_SEQFMT with %#p6 2008-10-28 16:05:40 -07:00
datagram.c IPv6: datagram_send_ctl() should exit immediately when an error occured 2008-07-29 23:57:58 -07:00
esp6.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
exthdrs.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
exthdrs_core.c
fib6_rules.c netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
icmp.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
inet6_connection_sock.c net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
inet6_hashtables.c net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
ip6_fib.c netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
ip6_flowlabel.c net: replace all current users of NIP6_SEQFMT with %#p6 2008-10-28 16:05:40 -07:00
ip6_input.c ipv6: added net argument to IP6_INC_STATS_BH 2008-10-08 11:09:27 -07:00
ip6_output.c net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00
ip6_tunnel.c net: Use hton[sl]() instead of __constant_hton[sl]() where applicable 2008-09-20 22:20:49 -07:00
ip6mr.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
ipcomp6.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
ipv6_sockglue.c ipv6: Fix the return interface index when get it while no message is received. 2008-08-17 23:21:52 -07:00
Kconfig ipsec: ipcomp - Merge IPComp implementations 2008-07-25 02:54:40 -07:00
Makefile [IPV6] MROUTE: Support multicast forwarding. 2008-04-05 22:33:38 +09:00
mcast.c net: replace all current users of NIP6_SEQFMT with %#p6 2008-10-28 16:05:40 -07:00
mip6.c net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
ndisc.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
netfilter.c netns: correct mib stats in ip6_route_me_harder() 2008-10-14 22:55:21 -07:00
proc.c ipv6: making ip and icmp statistics per/namespace 2008-10-08 11:16:45 -07:00
protocol.c net: remove CVS keywords 2008-06-11 21:00:38 -07:00
raw.c netns: add net parameter to IP6_INC_STATS 2008-10-08 10:54:51 -07:00
reassembly.c ipv6: added net argument to IP6_ADD_STATS_BH 2008-10-08 11:13:31 -07:00
route.c net: replace all current users of NIP6_SEQFMT with %#p6 2008-10-28 16:05:40 -07:00
sit.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2008-06-16 18:25:48 -07:00
syncookies.c tcp: Fix IPv6 fallout from 'Port redirection support for TCP' 2008-10-19 23:35:58 -07:00
sysctl_net_ipv6.c ipv6: sysctl fixes 2008-08-25 15:18:15 -07:00
tcp_ipv6.c net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
tunnel6.c [IPV6] TUNNEL6: Fix incoming packet length check for inter-protocol tunnel. 2008-06-05 04:02:32 +09:00
udp.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
udp_impl.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
udplite.c udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
xfrm6_input.c [XFRM] IPV6: Optimize xfrm6_input_addr(). 2008-03-25 10:23:56 +09:00
xfrm6_mode_beet.c ipsec: Interfamily IPSec BEET, ipv4-inner ipv6-outer 2008-08-06 02:40:25 -07:00
xfrm6_mode_ro.c [IPSEC]: Make x->lastused an unsigned long 2008-01-28 14:53:52 -08:00
xfrm6_mode_transport.c [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output 2007-10-10 16:55:54 -07:00
xfrm6_mode_tunnel.c [IPSEC]: Fix inter address family IPsec tunnel handling. 2008-03-24 14:51:51 -07:00
xfrm6_output.c [IPSEC]: Fix inter address family IPsec tunnel handling. 2008-03-24 14:51:51 -07:00
xfrm6_policy.c netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
xfrm6_state.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-03-27 18:48:56 -07:00
xfrm6_tunnel.c [XFRM] IPV6: Optimize __xfrm_tunnel_alloc_spi(). 2008-03-25 10:23:57 +09:00