Commit graph

34223 commits

Author SHA1 Message Date
John Fastabend
a96366bf26 net: sched: cls_u32 add missing rcu_assign_pointer and annotation
Add missing rcu_assign_pointer and missing  annotation for ht_up
in cls_u32.c

Caught by kbuild bot,

>> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
   net/sched/cls_u32.c:378:36:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:378:36:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
   net/sched/cls_u32.c:610:54:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:610:54:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
   net/sched/cls_u32.c:684:18:    expected struct tc_u_hnode [noderef] <asn:4>*ht_up
   net/sched/cls_u32.c:684:18:    got struct tc_u_hnode *[assigned] ht
>> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend
80aab73de4 net: sched: fix unsued cpu variable
kbuild test robot reported an unused variable cpu in cls_u32.c
after the patch below. This happens when PERF and MARK config
variables are disabled

Fix this is to use separate variables for perf and mark
and define the cpu variable inside the ifdef logic.

Fixes: 459d5f626d ("net: sched: make cls_u32 per cpu")'
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
WANG Cong
69301eaa7f net_sched: fix a null pointer dereference in tcindex_set_parms()
This patch fixes the following crash:

[   42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[   42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] PGD d2319067 PUD d4ffe067 PMD 0
[   42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603
[   42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000
[   42.200027] RIP: 0010:[<ffffffff817e3fc4>]  [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] RSP: 0018:ffff8800ce793898  EFLAGS: 00010202
[   42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000
[   42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8
[   42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001
[   42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238
[   42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620
[   42.200027] FS:  00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[   42.200027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0
[   42.200027] Stack:
[   42.200027]  ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000
[   42.200027]  ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000
[   42.200027]  ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0
[   42.200027] Call Trace:
[   42.200027]  [<ffffffff817e4169>] tcindex_change+0xdb/0xee
[   42.200027]  [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f
[   42.200027]  [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194
[   42.200027]  [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19
[   42.200027]  [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17
[   42.200027]  [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b
[   43.462494]  [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a
[   43.462494]  [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148
[   43.462494]  [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27
[   43.462494]  [<ffffffff81778165>] sock_sendmsg+0x57/0x71
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff81152c06>] ? might_fault+0xa0/0xa4
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7
[   43.462494]  [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb
[   43.462494]  [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37
[   43.462494]  [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9
[   43.462494]  [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4
[   43.462494]  [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38
[   43.462494]  [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57
[   43.462494]  [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54
[   43.462494]  [<ffffffff81779012>] __sys_sendmsg+0x42/0x60
[   43.462494]  [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c
[   43.462494]  [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b

'p->h' could be NULL while 'cp->h' is always update to date.

Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:20:09 -04:00
WANG Cong
44b75e4317 net_sched: fix memory leak in cls_tcindex
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:19:23 -04:00
Florian Fainelli
c1f570a6ab net: dsa: fix mii_bus to host_dev replacement
dsa_of_probe() still used cd->mii_bus instead of cd->host_dev when
building with CONFIG_OF=y. Fix this by making the replacement here as
well.

Fixes: b4d2394d01 ("dsa: Replace mii_bus with a generic host device")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:52:48 -04:00
WANG Cong
10ee1c34be net_sched: use tcindex_filter_result_init()
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:51:18 -04:00
WANG Cong
2f9a220eff net_sched: fix suspicious RCU usage in tcindex_classify()
This patch fixes the following kernel warning:

[   44.805900] [ INFO: suspicious RCU usage. ]
[   44.808946] 3.17.0-rc4+ #610 Not tainted
[   44.811831] -------------------------------
[   44.814873] net/sched/cls_tcindex.c:84 suspicious rcu_dereference_check() usage!

Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:49:42 -04:00
WANG Cong
a57a65ba47 net_sched: fix an allocation bug in tcindex_set_parms()
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:48:23 -04:00
WANG Cong
80dcbd12fb net_sched: fix suspicious RCU usage in cls_bpf_classify()
Fixes: commit 1f947bf151 ("net: sched: rcu'ify cls_bpf")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:42:08 -04:00
Alexander Duyck
b4d2394d01 dsa: Replace mii_bus with a generic host device
This change makes it so that instead of passing and storing a mii_bus we
instead pass and store a host_dev.  From there we can test to determine the
exact type of device, and can verify it is the correct device for our switch.

So for example it would be possible to pass a device pointer from a pci_dev
and instead of checking for a PHY ID we could check for a vendor and/or device
ID.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
Alexander Duyck
5075314e4e dsa: Split ops up, and avoid assigning tag_protocol and receive separately
This change addresses several issues.

First, it was possible to set tag_protocol without setting the ops pointer.
To correct that I have reordered things so that rcv is now populated before
we set tag_protocol.

Second, it didn't make much sense to keep setting the device ops each time a
new slave was registered.  So by moving the receive portion out into root
switch initialization that issue should be addressed.

Third, I wanted to avoid sending tags if the rcv pointer was not registered
so I changed the tag check to verify if the rcv function pointer is set on
the root tree.  If it is then we start sending DSA tagged frames.

Finally I split the device ops pointer in the structures into two spots.  I
placed the rcv function pointer in the root switch since this makes it
easiest to access from there, and I placed the xmit function pointer in the
slave for the same reason.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
Eric Dumazet
b3d6cb92fd tcp: do not copy headers in tcp_collapse()
tcp_collapse() wants to shrink skb so that the overhead is minimal.

Now we store tcp flags into TCP_SKB_CB(skb)->tcp_flags, we no longer
need to keep around full headers.
Whole available space is dedicated to the payload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:41:08 -04:00
Eric Dumazet
e93a0435f8 tcp: allow segment with FIN in tcp_try_coalesce()
We can allow a segment with FIN to be aggregated,
if we take care to add tcp flags,
and if skb_try_coalesce() takes care of zero sized skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:41:07 -04:00
Eric Dumazet
e11ecddf51 tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
Input path of TCP do not currently uses TCP_SKB_CB(skb)->tcp_flags,
which is only used in output path.

tcp_recvmsg(), looks at tcp_hdr(skb)->syn for every skb found in receive queue,
and its unfortunate because this bit is located in a cache line right before
the payload.

We can simplify TCP by copying tcp flags into TCP_SKB_CB(skb)->tcp_flags.

This patch does so, and avoids the cache line miss in tcp_recvmsg()

Following patches will
- allow a segment with FIN being coalesced in tcp_try_coalesce()
- simplify tcp_collapse() by not copying the headers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:41:07 -04:00
Sasha Levin
c0d1379a19 net: bpf: correctly handle errors in sk_attach_filter()
Commit "net: bpf: make eBPF interpreter images read-only" has changed bpf_prog
to be vmalloc()ed but never handled some of the errors paths of the old code.

On error within sk_attach_filter (which userspace can easily trigger), we'd
kfree() the vmalloc()ed memory, and leak the internal bpf_work_struct.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:37:49 -04:00
Hannes Frederic Sowa
233577a220 net: filter: constify detection of pkt_type_offset
Currently we have 2 pkt_type_offset functions doing the same thing and
spread across the architecture files. Remove those and replace them
with a PKT_TYPE_OFFSET macro helper which gets the constant value from a
zero sized sk_buff member right in front of the bitfield with offsetof.
This new offset marker does not change size of struct sk_buff.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:07:21 -04:00
Florian Fainelli
ac7a04c33d net: dsa: change tag_protocol to an enum
Now that we introduced an additional multiplexing/demultiplexing layer
with commit 3e8a72d1da ("net: dsa: reduce number of protocol hooks")
that lives within the DSA code, we no longer need to have a given switch
driver tag_protocol be an actual ethertype value, instead, we can
replace it with an enum: dsa_tag_protocol.

Do this replacement in the drivers, which allows us to get rid of the
cpu_to_be16()/htons() dance, and remove ETH_P_BRCMTAG since we do not
need it anymore.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:04:35 -04:00
WANG Cong
3ce62a84d5 ipv6: exit early in addrconf_notify() if IPv6 is disabled
If IPv6 is explicitly disabled before the interface comes up,
it makes no sense to continue when it comes up, even just
print a message.

(I am not sure about other cases though, so I prefer not to touch)

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:39:40 -04:00
WANG Cong
1691c63ea4 ipv6: refactor ipv6_dev_mc_inc()
Refactor out allocation and initialization and make
the refcount code more readable.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
f7ed925c1b ipv6: update the comment in mcast.c
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
414b6c943f ipv6: drop some rcu_read_lock in mcast
Similarly the code is already protected by rtnl lock.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
b5350916bf ipv6: drop ipv6_sk_mc_lock in mcast
Similarly the code is already protected by rtnl lock.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
83aa29eefd ipv6: refactor __ipv6_dev_ac_inc()
Refactor out allocation and initialization and make
the refcount code more readable.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
013b4d9038 ipv6: clean up ipv6_dev_ac_inc()
Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc()
to reflect this change.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
b03a9c04a3 ipv6: remove ipv6_sk_ac_lock
Just move rtnl lock up, so that the anycast list can be protected
by rtnl lock now.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong
6c555490e0 ipv6: drop useless rcu_read_lock() in anycast
These code is now protected by rtnl lock, rcu read lock
is useless now.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
John Fastabend
1f947bf151 net: sched: rcu'ify cls_bpf
This patch makes the cls_bpf classifier RCU safe. The tcf_lock
was being used to protect a list of cls_bpf_prog now this list
is RCU safe and updates occur with rcu_replace.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
b929d86d25 net: sched: rcu'ify cls_rsvp
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
1ce87720d4 net: sched: make cls_u32 lockless
Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
        q=`echo $i%8|bc`;
        echo -n "u32 tos: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
                  action skbedit queue_mapping $q;
        sleep 1;
        tc filter del dev p3p2 prio $i;

        echo -n "u32 tos hash table: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
        sleep 2;
        tc filter del dev p3p2 prio $i
        sleep 1;
done

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
459d5f626d net: sched: make cls_u32 per cpu
This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
331b72922c net: sched: RCU cls_tcindex
Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
1109c00547 net: sched: RCU cls_route
RCUify the route classifier. For now however spinlock's are used to
protect fastmap cache.

The issue here is the fastmap may be read by one CPU while the
cache is being updated by another. An array of pointers could be
one possible solution.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
e35a8ee599 net: sched: fw use RCU
RCU'ify fw classifier.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
70da9f0bf9 net: sched: cls_flow use RCU
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
952313bd62 net: sched: cls_cgroup use RCU
Make cgroup classifier safe for RCU.

Also drops the calls in the classify routine that were doing a
rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held
entering this routine we have issues with deleting the classifier
chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock()
pair noting all paths AFAIK hold rcu_read_lock.

If there is a case where classify is called without the rcu read lock
then an rcu splat will occur and we can correct it.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend
9888faefe1 net: sched: cls_basic use RCU
Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
John Fastabend
25d8c0d55f net: rcu-ify tcf_proto
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
John Fastabend
46e5da40ae net: qdisc: use rcu prefix and silence sparse warnings
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
Scott Wood
2d8f7e2c8a udp: Fix inverted NAPI_GRO_CB(skb)->flush test
Commit 2abb7cdc0d ("udp: Add support for doing checksum unnecessary
conversion") caused napi_gro_cb structs with the "flush" field zero to
take the "udp_gro_receive" path rather than the "set flush to 1" path
that they would previously take.  As a result I saw booting from an NFS
root hang shortly after starting userspace, with "server not
responding" messages.

This change to the handling of "flush == 0" packets appears to be
incidental to the goal of adding new code in the case where
skb_gro_checksum_validate_zero_check() returns zero.  Based on that and
the fact that it breaks things, I'm assuming that it is unintentional.

Fixes: 2abb7cdc0d ("udp: Add support for doing checksum unnecessary conversion")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:55:41 -04:00
Alexander Duyck
bf7fa551e0 mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path
There is a possible issue with the use, or lack thereof of sk_refcnt and
sk_wmem_alloc in the wifi ack status functionality.

Specifically if a socket were to request acknowledgements, and the socket
were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
to reach 0 it would be possible to have sock_queue_err_skb orphan the last
buffer, resulting in __sk_free being called on the socket.  After this the
buffer is enqueued on sk_error_queue, however the queue has already been
flushed resulting in at least a memory leak, if not a data corruption.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:51:25 -04:00
Alexander Duyck
cab41c47d9 skb: Add documentation for skb_clone_sk
This change adds some documentation to the call skb_clone_sk.  This is
meant to help clarify the purpose of the function for other developers.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:51:24 -04:00
Erik Hugne
0fc4dffad1 tipc: fix sparse warnings
This fixes the following sparse warnings:
sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static?
Also, the function is changed to return bool upon success, rather than a
potentially freed pointer.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 14:00:58 -07:00
David S. Miller
0aac383353 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
nf-next pull request

The following patchset contains Netfilter/IPVS updates for your
net-next tree. Regarding nf_tables, most updates focus on consolidating
the NAT infrastructure and adding support for masquerading. More
specifically, they are:

1) use __u8 instead of u_int8_t in arptables header, from
   Mike Frysinger.

2) Add support to match by skb->pkttype to the meta expression, from
   Ana Rey.

3) Add support to match by cpu to the meta expression, also from
   Ana Rey.

4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from
   Vytas Dauksa.

5) Fix netnet and netportnet hash types the range support for IPv4,
   from Sergey Popovich.

6) Fix missing-field-initializer warnings resolved, from Mark Rustad.

7) Dan Carperter reported possible integer overflows in ipset, from
   Jozsef Kadlecsick.

8) Filter out accounting objects in nfacct by type, so you can
   selectively reset quotas, from Alexey Perevalov.

9) Move specific NAT IPv4 functions to the core so x_tables and
   nf_tables can share the same NAT IPv4 engine.

10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4.

11) Move specific NAT IPv6 functions to the core so x_tables and
    nf_tables can share the same NAT IPv4 engine.

12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6.

13) Refactor code to add nft_delrule(), which can be reused in the
    enhancement of the NFT_MSG_DELTABLE to remove a table and its
    content, from Arturo Borrero.

14) Add a helper function to unregister chain hooks, from
    Arturo Borrero.

15) A cleanup to rename to nft_delrule_by_chain for consistency with
    the new nft_*() functions, also from Arturo.

16) Add support to match devgroup to the meta expression, from Ana Rey.

17) Reduce stack usage for IPVS socket option, from Julian Anastasov.

18) Remove unnecessary textsearch state initialization in xt_string,
    from Bojan Prtvar.

19) Add several helper functions to nf_tables, more work to prepare
    the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero.

20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from
    Arturo Borrero.

21) Support NAT flags in the nat expression to indicate the flavour,
    eg. random fully, from Arturo.

22) Add missing audit code to ebtables when replacing tables, from
    Nicolas Dichtel.

23) Generalize the IPv4 masquerading code to allow its re-use from
    nf_tables, from Arturo.

24) Generalize the IPv6 masquerading code, also from Arturo.

25) Add the new masq expression to support IPv4/IPv6 masquerading
    from nf_tables, also from Arturo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:46:32 -07:00
Joe Perches
b167a37c7b netfilter: Convert pr_warning to pr_warn
Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Joe Perches
47c4cfc37f iucv: Convert pr_warning to pr_warn
Use the more common pr_warn.
Coalesce formats.
Realign arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Joe Perches
294a0b7f31 pktgen: Convert pr_warning to pr_warn
Use the more common pr_warn.
Realign arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Joe Perches
ef423a4109 atm: Convert pr_warning to pr_warn
Use the more common pr_warn.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Tom Herbert
19424e052f sit: Add gro callbacks to sit_offload
Add ipv6_gro_receive and ipv6_gro_complete to sit_offload to
support GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 21:29:33 -07:00
Tom Herbert
9667e9bb3f ipip: Add gro callbacks to ipip offload
Add inet_gro_receive and inet_gro_complete to ipip_offload to
support GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 21:29:33 -07:00
Tom Herbert
03d56daafe ipv6: Clear flush_id to make GRO work
In TCP gro we check flush_id which is derived from the IP identifier.
In IPv4 gro path the flush_id is set with the expectation that every
matched packet increments IP identifier. In IPv6, the flush_id is
never set and thus is uinitialized. What's worse is that in IPv6
over IPv4 encapsulation, the IP identifier is taken from the outer
header which is currently not incremented on every packet for Linux
stack, so GRO in this case never matches packets (identifier is
not increasing).

This patch clears flush_id for every time for a matched packet in
IPv6 gro_receive. We need to do this each time to overwrite the
setting that would be done in IPv4 gro_receive per the outer
header in IPv6 over Ipv4 encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 21:29:33 -07:00