linux-stable/Documentation/networking
Kuniyuki Iwashima 9804985bf2 udp: Introduce optional per-netns hash table.
The maximum hash table size is 64K due to the nature of the protocol. [0]
It's smaller than TCP, and fewer sockets can cause a performance drop.

On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in
different netns, creating 32Mi sockets without data transfer in the root
netns causes regression for the iperf3's connection.

  uhash_entries		sockets		length		Gbps
	    64K		      1		     1		5.69
			    1Mi		    16		5.27
			    2Mi		    32		4.90
			    4Mi		    64		4.09
			    8Mi		   128		2.96
			   16Mi		   256		2.06
			   32Mi		   512		1.12

The per-netns hash table breaks the lengthy lists into shorter ones.  It is
useful on a multi-tenant system with thousands of netns.  With smaller hash
tables, we can look up sockets faster, isolate noisy neighbours, and reduce
lock contention.

The max size of the per-netns table is 64K as well.  This is because the
possible hash range by udp_hashfn() always fits in 64K within the same
netns and we cannot make full use of the whole buckets larger than 64K.

  /* 0 < num < 64K  ->  X < hash < X + 64K */
  (num + net_hash_mix(net)) & mask;

Also, the min size is 128.  We use a bitmap to search for an available
port in udp_lib_get_port().  To keep the bitmap on the stack and not
fire the CONFIG_FRAME_WARN error at build time, we round up the table
size to 128.

The sysctl usage is the same with TCP:

  $ dmesg | cut -d ' ' -f 6- | grep "UDP hash"
  UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)

  # sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 65536  # can be changed by uhash_entries

  # sysctl net.ipv4.udp_child_hash_entries
  net.ipv4.udp_child_hash_entries = 0  # disabled by default

  # ip netns add test1
  # ip netns exec test1 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = -65536  # share the global table

  # sysctl -w net.ipv4.udp_child_hash_entries=100
  net.ipv4.udp_child_hash_entries = 100

  # ip netns add test2
  # ip netns exec test2 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 128  # own a per-netns table with 2^n buckets

We could optimise the hash table lookup/iteration further by removing
the netns comparison for the per-netns one in the future.  Also, we
could optimise the sparse udp_hslot layout by putting it in udp_table.

[0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-16 09:43:35 +00:00
..
caif tty: cumulate and document tty_struct::flow* members 2021-05-13 16:57:16 +02:00
device_drivers Documentation: nfp: update documentation 2022-11-15 20:31:08 -08:00
devlink devlink: Add packet traps for 802.1X operation 2022-11-09 19:06:14 -08:00
dsa docs: net: dsa: update information about multiple CPU ports 2022-09-20 10:32:36 +02:00
mac80211_hwsim
6lowpan.rst
6pack.rst
af_xdp.rst doc, af_xdp: Fix bind flags option typo 2021-07-12 16:55:01 +02:00
alias.rst
arcnet-hardware.rst docs: networking: arcnet-hardware.rst: don't duplicate chapter names 2020-05-01 12:24:43 -07:00
arcnet.rst Documentation: networking: arcnet: drop doubled word 2020-07-04 17:46:21 -07:00
atm.rst
ax25.rst Documentation: networking: ax25: drop doubled word 2020-07-04 17:46:21 -07:00
bareudp.rst Documentation: bareudp: Corrected description of bareudp module. 2020-07-28 17:53:03 -07:00
batman-adv.rst batman-adv: Move IRC channel to hackint.org 2021-08-08 20:05:46 +02:00
bonding.rst Documentation: bonding: clarify supported modes for tlb_dynamic_lb 2022-08-30 23:17:54 -07:00
bridge.rst
can.rst can: add termination resistor documentation 2022-10-19 21:33:29 +02:00
can_ucan_protocol.rst Documentation: networking: can_ucan_protocol: drop doubled words 2020-07-04 17:46:21 -07:00
cdc_mbim.rst
checksum-offloads.rst
dccp.rst net: dccp: Add SIOCOUTQ IOCTL support (send buffer fill) 2020-07-22 17:00:37 -07:00
dctcp.rst
dns_resolver.rst
driver.rst Documentation: networking: correct possessive "its" 2022-08-31 12:36:08 -07:00
eql.rst
ethtool-netlink.rst ethtool: linkstate: add a statistic for PHY down events 2022-11-08 10:36:54 +01:00
failover.rst
fib_trie.rst
filter.rst treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
gen_stats.rst
generic-hdlc.rst
generic_netlink.rst
gtp.rst
ieee802154.rst docs: net: ieee802154.rst: fix C expressions 2020-10-15 07:49:41 +02:00
ila.rst
index.rst Documentation: networking: TC queue based filtering 2022-10-25 10:32:40 +02:00
ioam6-sysctl.rst ipv6: ioam: Documentation for new IOAM sysctls 2021-07-21 08:14:33 -07:00
ip-sysctl.rst udp: Introduce optional per-netns hash table. 2022-11-16 09:43:35 +00:00
ip_dynaddr.rst
ipddp.rst
ipsec.rst
ipv6.rst
ipvlan.rst Documentation: networking: correct possessive "its" 2022-08-31 12:36:08 -07:00
ipvs-sysctl.rst netfilter: ipvs: Fix reuse connection if RS weight is 0 2021-11-08 11:42:47 +01:00
j1939.rst can: j1939: add tables for the CAN identifier and its fields 2020-11-20 09:43:29 +01:00
kapi.rst wimax: move out to staging 2020-10-29 19:27:45 +01:00
kcm.rst
l2tp.rst Documentation: networking: correct possessive "its" 2022-08-31 12:36:08 -07:00
lapb-module.rst
mac80211-auth-assoc-deauth.txt
mac80211-injection.rst doc: networking: wireless: fix wiki website url 2020-06-08 10:05:53 +02:00
mctp.rst mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control 2022-02-09 12:00:11 +00:00
mpls-sysctl.rst
mptcp-sysctl.rst Documentation: mptcp: fix pm_type formatting 2022-09-13 10:18:44 +02:00
msg_zerocopy.rst docs: use the lore redirector everywhere 2021-10-12 13:58:19 -06:00
multiqueue.rst
net_dim.rst
net_failover.rst Documentation: networking: net_failover: Fix documentation 2021-11-17 13:59:49 +00:00
netconsole.rst
netdev-features.rst net: hsr: add offloading support 2021-02-11 13:24:44 -08:00
netdevices.rst net: bonding: move ioctl handling to private ndo operation 2021-07-27 20:11:45 +01:00
netfilter-sysctl.rst
netif-msg.rst
nexthop-group-resilient.rst Documentation: net: Document resilient next-hop groups 2021-03-29 13:51:38 -07:00
nf_conntrack-sysctl.rst netfilter: conntrack: remove nf_conntrack_helper documentation 2022-09-20 23:50:03 +02:00
nf_flowtable.rst docs: nf_flowtable: fix compilation and warnings 2021-03-25 17:42:02 -07:00
nfc.rst
openvswitch.rst
operstates.rst docs: operstates: document IF_OPER_TESTING 2021-08-02 15:16:04 +01:00
packet_mmap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
page_pool.rst Documentation: update networking/page_pool.rst 2022-03-03 09:55:28 +00:00
phonet.rst
phy.rst docs: networking: phy: add missing space 2022-10-05 20:32:39 -07:00
pktgen.rst pktgen: document the latest pktgen usage options 2021-08-25 13:44:30 +01:00
plip.rst
ppp_generic.rst docs: update ppp_generic.rst to document new ioctls 2020-12-10 13:57:36 -08:00
proc_net_tcp.rst
radiotap-headers.rst
rds.rst Doc: networking: Fix the title's Sphinx overline in rds.rst 2021-11-29 15:18:21 -07:00
regulatory.rst doc: networking: wireless: fix wiki website url 2020-06-08 10:05:53 +02:00
representors.rst docs: net: add an explanation of VF (and other) Representors 2022-09-21 07:31:38 -07:00
rxrpc.rst rxrpc: Remove rxrpc_get_reply_time() which is no longer used 2022-09-01 11:44:13 +01:00
scaling.rst docs: networking: update XPS to account for netif_set_xps_queue 2020-10-13 16:21:54 -07:00
sctp.rst
secid.rst
seg6-sysctl.rst doc: move seg6_flowlabel to seg6-sysctl.rst 2021-04-14 13:13:15 -07:00
segmentation-offloads.rst
sfp-phylink.rst doc: sfp-phylink: Fix a broken reference 2022-08-02 21:45:07 -07:00
skbuff.rst skbuff: render the checksum comment to documentation 2022-05-10 17:48:37 -07:00
smc-sysctl.rst net/smc: Unbind r/w buffer size from clcsock and make them tunable 2022-09-22 12:58:21 +02:00
snmp_counter.rst net-next: docs: Fix typos in snmp_counter.rst 2021-01-05 17:07:38 -08:00
statistics.rst docs: networking: extend the statistics documentation 2021-04-16 16:59:20 -07:00
strparser.rst docs: networking: convert strparser.txt to ReST 2020-04-30 12:56:38 -07:00
switchdev.rst docs: net: add an explanation of VF (and other) Representors 2022-09-21 07:31:38 -07:00
sysfs-tagging.rst Documentation: better locations for sysfs-pci, sysfs-tagging 2020-10-09 09:33:23 -06:00
tc-actions-env-rules.rst docs: networking: convert tc-actions-env-rules.txt to ReST 2020-04-30 12:56:38 -07:00
tc-queue-filters.rst Documentation: networking: TC queue based filtering 2022-10-25 10:32:40 +02:00
tcp-thin.rst docs: networking: convert tcp-thin.txt to ReST 2020-04-30 12:56:38 -07:00
team.rst docs: networking: convert team.txt to ReST 2020-04-30 12:56:38 -07:00
timestamping.rst docs: networking: Use netif_rx(). 2022-03-04 12:02:19 +00:00
tipc.rst Documentation: add more details in tipc.rst 2021-07-01 13:18:18 -07:00
tls-offload-layers.svg
tls-offload-reorder-bad.svg
tls-offload-reorder-good.svg
tls-offload.rst net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled 2021-01-19 15:58:05 -08:00
tls.rst tls: rx: add counter for NoPad violations 2022-07-11 19:48:33 -07:00
tproxy.rst docs: networking: convert tproxy.txt to ReST 2020-04-30 12:56:38 -07:00
tuntap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
udplite.rst docs: networking: convert udplite.txt to ReST 2020-05-01 12:24:40 -07:00
vrf.rst doc: Document unexpected tcp_l3mdev_accept=1 behavior 2021-08-23 11:53:24 +01:00
vxlan.rst docs: vxlan: add info about device features 2020-09-28 12:50:12 -07:00
x25-iface.rst net: x25: Queue received packets in the drivers instead of per-CPU queues 2021-04-05 11:42:12 -07:00
x25.rst net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xfrm_device.rst docs: networking: Fix a typo 2021-03-20 19:02:42 -07:00
xfrm_proc.rst docs: networking: convert xfrm_proc.txt to ReST 2020-05-01 12:24:40 -07:00
xfrm_sync.rst docs: networking: convert xfrm_sync.txt to ReST 2020-05-01 12:24:41 -07:00
xfrm_sysctl.rst docs: networking: convert xfrm_sysctl.txt to ReST 2020-05-01 12:24:41 -07:00