linux-stable/net/core
Jesper Dangaard Brouer 8788370a1d pktgen: RCU-ify "if_list" to remove lock in next_to_run()
The if_lock()/if_unlock() in next_to_run() adds a significant
overhead, because its called for every packet in busy loop of
pktgen_thread_worker().  (Thomas Graf originally pointed me
at this lock problem).

Removing these two "LOCK" operations should in theory save us approx
16ns (8ns x 2), as illustrated below we do save 16ns when removing
the locks and introducing RCU protection.

Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
 * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
 * Diff   : +588195 pps --> -16.50ns

To understand this RCU patch, I describe the pktgen thread model
below.

In pktgen there is several kernel threads, but there is only one CPU
running each kernel thread.  Communication with the kernel threads are
done through some thread control flags.  This allow the thread to
change data structures at a know synchronization point, see main
thread func pktgen_thread_worker().

Userspace changes are communicated through proc-file writes.  There
are three types of changes, general control changes "pgctrl"
(func:pgctrl_write), thread changes "kpktgend_X"
(func:pktgen_thread_write), and interface config changes "etcX@N"
(func:pktgen_if_write).

Userspace "pgctrl" and "thread" changes are synchronized via the mutex
pktgen_thread_lock, thus only a single userspace instance can run.
The mutex is taken while the packet generator is running, by pgctrl
"start".  Thus e.g. "add_device" cannot be invoked when pktgen is
running/started.

All "pgctrl" and all "thread" changes, except thread "add_device",
communicate via the thread control flags.  The main problem is the
exception "add_device", that modifies threads "if_list" directly.

Fortunately "add_device" cannot be invoked while pktgen is running.
But there exists a race between "rem_device_all" and "add_device"
(which normally don't occur, because "rem_device_all" waits 125ms
before returning). Background'ing "rem_device_all" and running
"add_device" immediately allow the race to occur.

The race affects the threads (list of devices) "if_list".  The if_lock
is used for protecting this "if_list".  Other readers are given
lock-free access to the list under RCU read sections.

Note, interface config changes (via proc) can occur while pktgen is
running, which worries me a bit.  I'm assuming proc_remove() takes
appropriate locks, to assure no writers exists after proc_remove()
finish.

I've been running a script exercising the race condition (leading me
to fix the proc_remove order), without any issues.  The script also
exercises concurrent proc writes, while the interface config is
getting removed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 15:50:23 -07:00
..
datagram.c net: Fix save software checksum complete 2014-06-15 01:00:49 -07:00
dev.c rtnetlink: allow to register ops without ops->setup set 2014-07-01 14:40:17 -07:00
dev_addr_lists.c net: Add support for device specific address syncing 2014-06-02 10:40:54 -07:00
dev_ioctl.c net_tstamp: Add SIOCGHWTSTAMP ioctl to match SIOCSHWTSTAMP 2013-11-19 19:07:21 +00:00
drop_monitor.c net: drop_monitor: fix the value of maxattr 2013-12-09 21:10:38 -05:00
dst.c ipv4: fix dst race in sk_dst_get() 2014-06-25 17:41:44 -07:00
ethtool.c ethtool: Check that reserved fields of struct ethtool_rxfh are 0 2014-06-03 02:43:16 +01:00
fib_rules.c net: fix 'ip rule' iif/oif device rename 2014-02-09 19:02:52 -08:00
filter.c net: filter: Use kcalloc/kmalloc_array to allocate arrays 2014-06-25 16:40:02 -07:00
flow.c CPU hotplug notifiers registration fixes for 3.15-rc1 2014-04-07 14:55:46 -07:00
flow_dissector.c flow_keys: Record IP layer protocol in skb_flow_dissect() 2014-06-23 14:32:19 -07:00
gen_estimator.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
gen_stats.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
iovec.c net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
link_watch.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
Makefile net: Add a software TSO helper API 2014-05-22 14:57:15 -04:00
neighbour.c neigh: set nud_state to NUD_INCOMPLETE when probing router reachability 2014-05-13 12:43:05 -04:00
net-procfs.c rps: selective flow shedding during softnet overflow 2013-05-20 13:48:04 -07:00
net-sysfs.c Revert "net: return actual error on register_queue_kobjects" 2014-06-19 18:12:15 -07:00
net-sysfs.h net: netdev_kobject_init: annotate with __init 2014-01-05 20:27:54 -05:00
net-traces.c
net_namespace.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-05-24 00:32:30 -04:00
netclassid_cgroup.c cgroup: remove css_parent() 2014-05-16 13:22:48 -04:00
netevent.c
netpoll.c netpoll: Use skb_irq_freeable to make zap_completion_queue safe. 2014-04-01 17:53:36 -04:00
netprio_cgroup.c cgroup: remove css_parent() 2014-05-16 13:22:48 -04:00
pktgen.c pktgen: RCU-ify "if_list" to remove lock in next_to_run() 2014-07-01 15:50:23 -07:00
ptp_classifier.c net: filter: let unattached filters use sock_fprog_kern 2014-05-23 16:48:05 -04:00
request_sock.c inet: reduce TLB pressure for listeners 2014-06-25 16:37:24 -07:00
rtnetlink.c rtnetlink: allow to register ops without ops->setup set 2014-07-01 14:40:17 -07:00
scm.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
secure_seq.c inetpeer: get rid of ip_id_count 2014-06-02 11:00:41 -07:00
skbuff.c net: fix setting csum_start in skb_segment() 2014-06-25 20:45:54 -07:00
sock.c net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
sock_diag.c net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump 2014-04-24 13:44:53 -04:00
stream.c net: replace macros net_random and net_srandom with direct calls to prandom 2014-01-14 15:15:25 -08:00
sysctl_net_core.c rps: NUMA flow limit allocations 2013-12-19 19:00:07 -05:00
timestamping.c net: ptp: move PTP classifier in its own file 2014-04-01 16:43:18 -04:00
tso.c net: tso: Export symbols for modular build 2014-05-30 15:52:03 -07:00
user_dma.c
utils.c net: avoid dependency of net_get_random_once on nop patching 2014-05-14 00:37:34 -04:00