linux-stable/include/net
Catalin(ux) M. BOIE 6f7edb4881 IPVS: Allow boot time change of hash size
I was very frustrated about the fact that I have to recompile the kernel
to change the hash size. So, I created this patch.

If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
command line, or, if you built IPVS as modules, you can add
options ip_vs conn_tab_bits=??.

To keep everything backward compatible, you still can select the size at
compile time, and that will be used as default.

It has been about a year since this patch was originally posted
and subsequently dropped on the basis of insufficient test data.

Mark Bergsma has provided the following test results which seem
to strongly support the need for larger hash table sizes:

We do however run into the same problem with the default setting (212 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.

To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.

With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
1719761  42.3741  ip_vs.ko                 ip_vs.ko      ip_vs_conn_in_get
302577    7.4554  bnx2                     bnx2          /bnx2
181984    4.4840  vmlinux                  vmlinux       __ticket_spin_lock
128636    3.1695  vmlinux                  vmlinux       ip_route_input
74345     1.8318  ip_vs.ko                 ip_vs.ko      ip_vs_conn_out_get
68482     1.6874  vmlinux                  vmlinux       mwait_idle

After loading the recompiled kernel with 218 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:

samples  %        image name               app name
symbol name
265641   14.4616  bnx2                     bnx2         /bnx2
143251    7.7986  vmlinux                  vmlinux      __ticket_spin_lock
140661    7.6576  ip_vs.ko                 ip_vs.ko     ip_vs_conn_in_get
94364     5.1372  vmlinux                  vmlinux      mwait_idle
86267     4.6964  vmlinux                  vmlinux      ip_route_input

[ horms@verge.net.au: trivial up-port and minor style fixes ]
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Cc: Mark Bergsma <mark@wikimedia.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-05 05:50:24 +01:00
..
9p 9p: fix readdir corner cases 2009-11-02 08:43:45 -06:00
bluetooth Bluetooth: Implement RejActioned flag 2009-12-03 19:34:24 +01:00
irda net: mark read-only arrays as const 2009-08-05 10:42:58 -07:00
iucv af_iucv: Return -EAGAIN if iucv msg limit is exceeded 2009-06-19 00:10:40 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-12-16 10:33:18 -08:00
netns net: Allow xfrm_user_net_exit to batch efficiently. 2009-12-03 12:22:03 -08:00
phonet Phonet: convert devices list to RCU 2009-11-18 10:08:26 -08:00
sctp Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
tc_act pkt_sched: skbedit add support for setting mark 2009-10-22 21:56:42 -07:00
tipc
act_api.h net: restore gnet_stats_basic to previous definition 2009-08-17 21:33:49 -07:00
addrconf.h bonding: remap muticast addresses without using dev_close() and dev_open() 2009-09-15 02:37:40 -07:00
af_ieee802154.h af_ieee802154: add support for WANT_ACK socket option 2009-08-12 21:54:50 -07:00
af_rxrpc.h
af_unix.h net: Fix soft lockups/OOM issues w/ unix garbage collector 2008-11-26 15:32:27 -08:00
ah.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
arp.h net: make neigh_ops constant 2009-09-01 17:40:57 -07:00
atmclip.h clip: convert to internal network_device_stats 2009-01-21 14:01:59 -08:00
ax25.h
ax88796.h ax88796: Add method to take MAC from platform data 2009-03-24 23:32:03 -07:00
cfg80211.h mac80211/cfg80211: add station events 2009-12-28 16:55:06 -05:00
checksum.h include/net net/ - csum_partial - remove unnecessary casts 2008-11-19 15:44:53 -08:00
cipso_ipv4.h netlabel: Label incoming TCP connections correctly in SELinux 2009-03-28 15:01:36 +11:00
compat.h net: fix compat_sys_recvmmsg parameter type 2009-12-11 15:07:56 -08:00
datalink.h
dcbnl.h dcbnl: Add support for setapp/getapp to netdev dcbnl_rtnl_ops 2009-09-01 01:24:30 -07:00
dn.h decnet: compile fix for removal of byteorder wrapper 2008-11-27 23:04:13 -08:00
dn_dev.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
dn_fib.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
dn_neigh.h
dn_nsp.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
dn_route.h
dsa.h dsa: add switch chip cascading support 2009-03-21 19:06:54 -07:00
dsfield.h
dst.h net: Add rtnetlink init_rcvwnd to set the TCP initial receive window 2009-12-23 14:13:30 -08:00
dst_ops.h netns: embed ip6_dst_ops directly 2009-09-01 17:40:31 -07:00
esp.h
ethoc.h net: Add support for the OpenCores 10/100 Mbps Ethernet MAC. 2009-03-27 00:16:21 -07:00
fib_rules.h net: Allow fib_rule_unregister to batch 2009-12-03 12:22:55 -08:00
flow.h netns xfrm: lookup in netns 2008-11-25 17:35:18 -08:00
garp.h
gen_stats.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
genetlink.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
icmp.h mib: put icmpmsg statistics on struct net 2008-07-18 04:04:22 -07:00
ieee80211_radiotap.h mac80211: fix radiotap header generation 2009-10-30 16:49:20 -04:00
ieee802154.h ieee802154: move headers out of extra directory 2009-07-23 17:08:51 +04:00
ieee802154_netdev.h ieee802154: add an mlme_ops call to retrieve PHY object 2009-11-06 14:32:18 +03:00
if_inet6.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
inet6_connection_sock.h
inet6_hashtables.h tcp: Fix a connect() race with timewait sockets 2009-12-08 20:17:51 -08:00
inet_common.h
inet_connection_sock.h net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
inet_ecn.h net: replace __constant_{endian} uses in net headers 2009-02-14 22:58:35 -08:00
inet_frag.h inet fragments: fix sparse warning: context imbalance 2009-02-26 23:13:35 -08:00
inet_hashtables.h tcp: Fix a connect() race with timewait sockets 2009-12-08 20:17:51 -08:00
inet_sock.h inet: rename some inet_sock fields 2009-10-18 18:52:53 -07:00
inet_timewait_sock.h tcp: Fix a connect() race with timewait sockets 2009-12-08 20:17:51 -08:00
inetpeer.h inetpeer: Optimize inet_getid() 2009-11-13 20:46:58 -08:00
ip.h netfilter: fix crashes in bridge netfilter caused by fragment jumps 2009-12-15 16:59:59 +01:00
ip6_checksum.h
ip6_fib.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
ip6_route.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
ip6_tunnel.h
ip_fib.h Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-11-06 00:55:55 -08:00
ip_vs.h IPVS: Allow boot time change of hash size 2010-01-05 05:50:24 +01:00
ipcomp.h ipsec: ipcomp - Merge IPComp implementations 2008-07-25 02:54:40 -07:00
ipconfig.h
ipip.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
ipv6.h netfilter: fix crashes in bridge netfilter caused by fragment jumps 2009-12-15 16:59:59 +01:00
ipx.h net: replace __constant_{endian} uses in net headers 2009-02-14 22:58:35 -08:00
iw_handler.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
lapb.h
lib80211.h wireless: missing include in lib80211.h 2008-11-21 11:42:55 -05:00
llc.h llc: convert llc_sap_list to RCU 2009-12-26 20:46:28 -08:00
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h llc: use a device based hash table to speed up multicast delivery 2009-12-26 20:43:57 -08:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
mac80211.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-12-30 15:25:08 -05:00
mip6.h
ndisc.h sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
neighbour.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2009-12-14 09:58:24 -08:00
net_namespace.h net: Add support for batching network namespace cleanups 2009-12-03 12:22:01 -08:00
netdma.h net_dma: convert to dma_find_channel 2009-01-06 11:38:15 -07:00
netevent.h
netlabel.h netlabel: Cleanup the Smack/NetLabel code to fix incoming TCP connections 2009-03-28 15:01:37 +11:00
netlink.h netlink: constify nlmsghdr arguments 2009-08-25 16:07:40 +02:00
netrom.h netrom: convert to internal net_device_stats 2009-01-21 14:02:01 -08:00
nexthop.h
nl802154.h ieee802154: add support for channel pages from IEEE 802.15.4-2006 2009-08-19 23:08:22 +04:00
p8022.h
pkt_cls.h net: rename skb->iif to skb->skb_iif 2009-11-20 15:35:04 -08:00
pkt_sched.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
protocol.h net: drop capability from protocol definitions 2009-11-05 21:40:17 -08:00
psnap.h snap: use const for descriptor 2009-03-21 19:06:50 -07:00
raw.h
rawv6.h ipv6: Use correct data types for ICMPv6 type and code 2009-06-23 04:31:07 -07:00
red.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
regulatory.h cfg80211: clean up includes 2009-04-22 16:57:17 -04:00
request_sock.h TCPCT part 1a: add request_values parameter for sending SYNACK 2009-12-02 22:07:23 -08:00
rose.h NET: ROSE: Don't use static buffer. 2009-07-26 19:11:14 -07:00
route.h net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCH 2009-12-01 16:15:50 -08:00
rtnetlink.h net: Support specifying the network namespace upon device creation. 2009-11-08 00:53:51 -08:00
sch_generic.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
scm.h net: cleanup include/net 2009-11-04 05:06:25 -08:00
slhc_vj.h
snmp.h this_cpu: Use this_cpu operations for SNMP statistics 2009-10-03 19:48:22 +09:00
sock.h udp: secondary hash on (local port, local address) 2009-11-08 20:53:06 -08:00
stp.h
tcp.h net: Add rtnetlink init_rcvwnd to set the TCP initial receive window 2009-12-23 14:13:30 -08:00
tcp_states.h
timewait_sock.h net: Fix memory leak in the proto_register function 2008-11-21 16:45:22 -08:00
transp_v6.h inet: inet_connection_sock_af_ops const 2009-09-02 01:03:49 -07:00
udp.h udp: bind() optimisation 2009-11-10 20:54:38 -08:00
udplite.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
wext.h wext: refactor 2009-10-07 16:39:43 -04:00
wimax.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
wpan-phy.h ieee802154: add support for creation/removal of logic interfaces 2009-11-06 14:32:24 +03:00
x25.h X25: Move SYSCTL ifdefs into header 2009-11-29 00:24:59 -08:00
x25device.h
xfrm.h xfrm: Store aalg in xfrm_state with a user specified truncation length 2009-11-25 15:48:38 -08:00