linux-stable/net/ipv4
Jesper Dangaard Brouer c2a936600f net: increase fragment memory usage limits
Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17 14:29:53 -05:00
..
netfilter netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE 2012-12-16 23:28:30 +01:00
af_inet.c ipv4: Use FIELD_SIZEOF() in inet_init(). 2013-01-09 23:38:23 -08:00
ah4.c
arp.c arp: fix a regression in arp_solicit() 2012-12-24 18:42:58 -08:00
cipso_ipv4.c
datagram.c
devinet.c ipv4: fix NULL checking in devinet_ioctl() 2013-01-06 21:11:18 -08:00
esp4.c
fib_frontend.c ipv4: fib: fix a comment. 2013-01-11 15:58:08 -08:00
fib_lookup.h
fib_rules.c sections: fix section conflicts in net 2012-10-06 03:04:45 +09:00
fib_semantics.c ipv4: 16 slots in initial fib_info hash table 2012-10-22 14:29:06 -04:00
fib_trie.c ipv4/route: arg delay is useless in rt_cache_flush() 2012-09-18 15:44:34 -04:00
gre.c
icmp.c ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow() 2012-11-26 17:24:41 -05:00
igmp.c igmp: export symbol ip_mc_leave_group 2012-10-01 18:39:44 -04:00
inet_connection_sock.c inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock 2012-12-14 13:14:07 -05:00
inet_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-12-12 18:07:07 -08:00
inet_fragment.c ipv6: unify fragment thresh handling code 2012-09-19 17:23:28 -04:00
inet_hashtables.c net: move inet_dport/inet_num in sock_common 2012-11-30 15:02:56 -05:00
inet_lro.c
inet_timewait_sock.c
inetpeer.c Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2012-10-02 09:54:49 -07:00
ip_forward.c ipv4: introduce rt_uses_gateway 2012-10-08 17:42:36 -04:00
ip_fragment.c net: increase fragment memory usage limits 2013-01-17 14:29:53 -05:00
ip_gre.c ipv4/ip_gre: set transport header correctly to gre header 2012-12-26 15:19:56 -08:00
ip_input.c net: TCP early demux cleanup 2012-07-30 14:53:21 -07:00
ip_options.c net: Allow userns root to control ipv4 2012-11-18 20:32:45 -05:00
ip_output.c net: Handle encapsulated offloads before fragmentation or handing to lower dev 2012-12-09 00:20:28 -05:00
ip_sockglue.c net: prevent setting ttl=0 via IP_TTL 2013-01-08 17:57:10 -08:00
ip_vti.c net: Allow userns root to control ipv4 2012-11-18 20:32:45 -05:00
ipcomp.c
ipconfig.c net/ipv4/ipconfig: really display the BOOTP/DHCP server's address. 2013-01-04 15:14:14 -08:00
ipip.c net: Allow userns root to control ipv4 2012-11-18 20:32:45 -05:00
ipmr.c ipmr: advertise new mfc entries via rtnl 2012-12-04 13:08:11 -05:00
Kconfig
Makefile memcg: rename config variables 2012-07-31 18:42:43 -07:00
netfilter.c netfilter: properly annotate ipv4_netfilter_{init,fini}() 2012-09-03 13:56:04 +02:00
ping.c userns: Use kgids for sysctl_ping_group_range 2012-08-14 21:49:10 -07:00
proc.c tcp: TCP Fast Open Server - header & support functions 2012-08-31 20:02:18 -04:00
protocol.c net: Add net protocol offload registration infrustructure 2012-11-15 17:36:17 -05:00
raw.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-10-02 11:11:09 -07:00
route.c ipv4/route/rtnl: get mcast attributes when dst is multicast 2012-12-07 12:24:33 -05:00
syncookies.c tcp: make sysctl_tcp_ecn namespace aware 2013-01-06 21:09:56 -08:00
sysctl_net_ipv4.c tcp: make sysctl_tcp_ecn namespace aware 2013-01-06 21:09:56 -08:00
tcp.c tcp: fix splice() and tcp collapsing interaction 2013-01-10 14:09:57 -08:00
tcp_bic.c
tcp_cong.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
tcp_cubic.c
tcp_diag.c
tcp_fastopen.c tcp: TCP Fast Open Server - header & support functions 2012-08-31 20:02:18 -04:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c net: fix divide by zero in tcp algorithm illinois 2012-11-01 11:55:59 -04:00
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-01-15 15:05:59 -05:00
tcp_ipv4.c tcp: make sysctl_tcp_ecn namespace aware 2013-01-06 21:09:56 -08:00
tcp_lp.c
tcp_memcontrol.c
tcp_metrics.c tcp: handle tcp_net_metrics_init() order-5 memory allocation failures 2012-11-16 13:36:27 -05:00
tcp_minisocks.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-11-10 18:32:51 -05:00
tcp_output.c tcp: make sysctl_tcp_ecn namespace aware 2013-01-06 21:09:56 -08:00
tcp_probe.c
tcp_scalable.c
tcp_timer.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-11-10 18:32:51 -05:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-10-02 11:11:09 -07:00
udp_diag.c netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
udp_impl.h
udplite.c
xfrm4_input.c ipv4: Fix input route performance regression. 2012-07-26 15:50:39 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c xfrm: Fix the gc threshold value for ipv4 2012-11-13 09:15:07 +01:00
xfrm4_state.c
xfrm4_tunnel.c