Commit graph

5817 commits

Author SHA1 Message Date
Florian Westphal
62eec0d733 netfilter: conntrack: pass hook state to log functions
The packet logger backend is unable to provide the incoming (or
outgoing) interface name because that information isn't available.

Pass the hook state, it contains the network namespace, the protocol
family, the network interfaces and other things.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-18 14:47:43 +02:00
Pablo Neira Ayuso
836382dc24 netfilter: nf_tables: add last expression
Add a new optional expression that tells you when last matching on a
given rule / set element element has happened.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-17 03:23:00 +02:00
Phil Sutter
06e95f0a2a netfilter: nft_extdhr: Drop pointless check of tprot_set
Pablo says, tprot_set is only there to detect if tprot was set to
IPPROTO_IP as that evaluates to zero. Therefore, code asserting a
different value in tprot does not need to check tprot_set.

Fixes: 935b7f6430 ("netfilter: nft_exthdr: add TCP option matching")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 22:25:01 +02:00
Phil Sutter
5acc44f394 netfilter: nft_exthdr: Search chunks in SCTP packets only
Since user space does not generate a payload dependency, plain sctp
chunk matches cause searching in non-SCTP packets, too. Avoid this
potential mis-interpretation of packet data by checking pkt->tprot.

Fixes: 133dc203d7 ("netfilter: nft_exthdr: Support SCTP chunks")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 22:25:01 +02:00
Pablo Neira Ayuso
52f0f4e178 netfilter: nft_tproxy: restrict support to TCP and UDP transport protocols
Add unfront check for TCP and UDP packets before performing further
processing.

Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 20:51:51 +02:00
Pablo Neira Ayuso
8f518d43f8 netfilter: nft_osf: check for TCP packet before further processing
The osf expression only supports for TCP packets, add a upfront sanity
check to skip packet parsing if this is not a TCP packet.

Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 20:51:50 +02:00
Pablo Neira Ayuso
cdd73cc545 netfilter: nft_exthdr: check for IPv6 packet before further processing
ipv6_find_hdr() does not validate that this is an IPv6 packet. Add a
sanity check for calling ipv6_find_hdr() to make sure an IPv6 packet
is passed for parsing.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-16 20:51:50 +02:00
David S. Miller
22488e4550 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix a crash when stateful expression with its own gc callback
   is used in a set definition.

2) Skip IPv6 packets from any link-local address in IPv6 fib expression.
   Add a selftest for this scenario, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10 14:33:56 -07:00
Maxim Mikityanskiy
5fc177ab75 netfilter: synproxy: Fix out of bounds when parsing TCP options
The TCP option parser in synproxy (synproxy_parse_options) could read
one byte out of bounds. When the length is 1, the execution flow gets
into the loop, reads one byte of the opcode, and if the opcode is
neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds
the length of 1.

This fix is inspired by commit 9609dad263 ("ipv4: tcp_input: fix stack
out of bounds when parsing TCP options.").

v2 changes:

Added an early return when length < 0 to avoid calling
skb_header_pointer with negative length.

Cc: Young Xiao <92siuyang@gmail.com>
Fixes: 48b1de4c11 ("netfilter: add SYNPROXY core/target")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10 14:26:18 -07:00
David S. Miller
7f3579e189 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add nfgenmsg field to nfnetlink's struct nfnl_info and use it.

2) Remove nft_ctx_init_from_elemattr() and nft_ctx_init_from_setattr()
   helper functions.

3) Add the nf_ct_pernet() helper function to fetch the conntrack
   pernetns data area.

4) Expose TCP and UDP flowtable offload timeouts through sysctl,
   from Oz Shlomo.

5) Add nfnetlink_hook subsystem to fetch the netfilter hook
   pipeline configuration, from Florian Westphal. This also includes
   a new field to annotate the hook type as metadata.

6) Fix unsafe memory access to non-linear skbuff in the new SCTP
   chunk support for nft_exthdr, from Phil Sutter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-09 14:50:35 -07:00
Florian Westphal
c5c6accd7b netfilter: nf_tables: move base hook annotation to init helper
coverity scanner says:
2187  if (nft_is_base_chain(chain)) {
vvv   CID 1505166:  Memory - corruptions  (UNINIT)
vvv   Using uninitialized value "basechain".
2188  basechain->ops.hook_ops_type = NF_HOOK_OP_NF_TABLES;

... I don't see how nft_is_base_chain() can evaluate to true
while basechain pointer is garbage.

However, it seems better to place the NF_HOOK_OP_NF_TABLES annotation
in nft_basechain_hook_init() instead.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1505166 ("Memory - corruptions")
Fixes: 65b8b7bfc5284f ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-09 21:29:23 +02:00
Florian Westphal
d4fb1f954f netfilter: nfnetlink_hook: add depends-on nftables
nfnetlink_hook.c: In function 'nfnl_hook_put_nft_chain_info':
nfnetlink_hook.c:76:7: error: implicit declaration of 'nft_is_active'

This macro is only defined when NF_TABLES is enabled.
While its possible to also add an ifdef-guard, the infrastructure
is currently not useful without nf_tables.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 252956528caa ("netfilter: add new hook nfnl subsystem")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-09 21:29:12 +02:00
Colin Ian King
5302560bb4 netfilter: nfnetlink_hook: fix array index out-of-bounds error
Currently the array net->nf.hooks_ipv6 is accessed by index hook
before hook is sanity checked. Fix this by moving the sanity check
to before the array access.

Addresses-Coverity: ("Out-of-bounds access")
Fixes: e2cf17d377 ("netfilter: add new hook nfnl subsystem")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-09 21:28:50 +02:00
Pablo Neira Ayuso
ad9f151e56 netfilter: nf_tables: initialize set before expression setup
nft_set_elem_expr_alloc() needs an initialized set if expression sets on
the NFT_EXPR_GC flag. Move set fields initialization before expression
setup.

[4512935.019450] ==================================================================
[4512935.019456] BUG: KASAN: null-ptr-deref in nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019487] Read of size 8 at addr 0000000000000070 by task nft/23532
[4512935.019494] CPU: 1 PID: 23532 Comm: nft Not tainted 5.12.0-rc4+ #48
[...]
[4512935.019502] Call Trace:
[4512935.019505]  dump_stack+0x89/0xb4
[4512935.019512]  ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019536]  ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019560]  kasan_report.cold.12+0x5f/0xd8
[4512935.019566]  ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019590]  nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019615]  nf_tables_newset+0xc7f/0x1460 [nf_tables]

Reported-by: syzbot+ce96ca2b1d0b37c6422d@syzkaller.appspotmail.com
Fixes: 65038428b2 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-09 21:06:35 +02:00
Phil Sutter
cf6b5ffdce netfilter: nft_exthdr: Fix for unsafe packet data read
While iterating through an SCTP packet's chunks, skb_header_pointer() is
called for the minimum expected chunk header size. If (that part of) the
skbuff is non-linear, the following memcpy() may read data past
temporary buffer '_sch'. Use skb_copy_bits() instead which does the
right thing in this situation.

Fixes: 133dc203d7 ("netfilter: nft_exthdr: Support SCTP chunks")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-09 02:42:52 +02:00
David S. Miller
126285651b Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Bug fixes overlapping feature additions and refactoring, mostly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07 13:01:52 -07:00
Florian Westphal
e2cf17d377 netfilter: add new hook nfnl subsystem
This nfnl subsystem allows to dump the list of all active netfiler hooks,
e.g. defrag, conntrack, nf/ip/arp/ip6tables and so on.

This helps to see what kind of features are currently enabled in
the network stack.

Sample output from nft tool using this infra:

 $ nft list hook ip input
 family ip hook input {
   +0000000010 nft_do_chain_inet [nf_tables] # nft table firewalld INPUT
   +0000000100 nf_nat_ipv4_local_in [nf_nat]
   +2147483647 ipv4_confirm [nf_conntrack]
 }

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:41:10 +02:00
Florian Westphal
7b4b2fa375 netfilter: annotate nf_tables base hook ops
This will allow a followup patch to treat the 'ops->priv' pointer
as nft_chain argument without having to first walk the table/chains
to check if there is a matching base chain pointer.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:38 +02:00
Oz Shlomo
1d91d2e1a7 netfilter: flowtable: Set offload timeouts according to proto values
Currently the aging period for tcp/udp connections is hard coded to
30 seconds. Aged tcp/udp connections configure a hard coded 120/30
seconds pickup timeout for conntrack.
This configuration may be too aggressive or permissive for some users.

Dynamically configure the nf flow table GC timeout intervals according
to the user defined values.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:38 +02:00
Oz Shlomo
975c57504d netfilter: conntrack: Introduce udp offload timeout configuration
UDP connections may be offloaded from nf conntrack to nf flow table.
Offloaded connections are aged after 30 seconds of inactivity.
Once aged, ownership is returned to conntrack with a hard coded pickup
time of 30 seconds, after which the connection may be deleted.
eted. The current aging intervals may be too aggressive for some users.

Provide users with the ability to control the nf flow table offload
aging and pickup time intervals via sysctl parameter as a pre-step for
configuring the nf flow table GC timeout intervals.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:37 +02:00
Oz Shlomo
ef8ed5ea09 netfilter: conntrack: Introduce tcp offload timeout configuration
TCP connections may be offloaded from nf conntrack to nf flow table.
Offloaded connections are aged after 30 seconds of inactivity.
Once aged, ownership is returned to conntrack with a hard coded pickup
time of 120 seconds, after which the connection may be deleted.
eted. The current aging intervals may be too aggressive for some users.

Provide users with the ability to control the nf flow table offload
aging and pickup time intervals via sysctl parameter as a pre-step for
configuring the nf flow table GC timeout intervals.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:37 +02:00
Pablo Neira Ayuso
0418b989a4 netfilter: nftables: add nf_ct_pernet() helper function
Consolidate call to net_generic(net, nf_conntrack_net_id) in this
wrapper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:37 +02:00
Pablo Neira Ayuso
670866512f netfilter: nf_tables: remove nft_ctx_init_from_setattr()
Replace nft_ctx_init_from_setattr() by nft_table_lookup().

This patch also disentangles nf_tables_delset() where NFTA_SET_TABLE is
required while nft_ctx_init_from_setattr() allows it to be optional.

From the nf_tables_delset() path, this also allows to set up the context
structure when it is needed.

Removing this helper function saves us 14 LoC, so it is not helping to
consolidate code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:36 +02:00
Pablo Neira Ayuso
e2b750d78b netfilter: nf_tables: remove nft_ctx_init_from_elemattr()
Replace nft_ctx_init_from_elemattr() by nft_table_lookup() and set up
the context structure right before it is really needed.

Moreover, nft_ctx_init_from_elemattr() is setting up the context
structure for codepaths where this is not really needed at all.

This helper function is also not helping to consolidate code, removing
it saves us 4 LoC.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:36 +02:00
Pablo Neira Ayuso
ef4b65e53c netfilter: nfnetlink: add struct nfgenmsg to struct nfnl_info and use it
Update the nfnl_info structure to add a pointer to the nfnetlink header.
This simplifies the existing codebase since this header is usually
accessed. Update existing clients to use this new field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07 12:23:36 +02:00
Pablo Neira Ayuso
8971ee8b08 netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
The private helper data size cannot be updated. However, updates that
contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
the same.

Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-02 12:43:50 +02:00
Pablo Neira Ayuso
1710eb913b netfilter: nft_ct: skip expectations for confirmed conntrack
nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
conntrack entry. However, nf_ct_ext_add() can only be called for
!nf_ct_is_confirmed().

[ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
[ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202
[ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887
[ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440
[ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447
[ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440
[ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20
[ 1825.352240] FS:  00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000
[ 1825.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0
[ 1825.352508] Call Trace:
[ 1825.352544]  nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
[ 1825.352641]  nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
[ 1825.352716]  nft_do_chain+0x232/0x850 [nf_tables]

Add the ct helper extension only for unconfirmed conntrack. Skip rule
evaluation if the ct helper extension does not exist. Thus, you can
only create expectations from the first packet.

It should be possible to remove this limitation by adding a new action
to attach a generic ct helper to the first packet. Then, use this ct
helper extension from follow up packets to create the ct expectation.

While at it, add a missing check to skip the template conntrack too
and remove check for IPCT_UNTRACK which is implicit to !ct.

Fixes: 857b46027d ("netfilter: nft_ct: add ct expectations support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-02 12:43:34 +02:00
Florian Westphal
8a1c08ad19 netfilter: fix clang-12 fmt string warnings
nf_conntrack_h323_main.c:198:6: warning: format specifies type 'unsigned short' but
xt_AUDIT.c:121:9: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-01 23:53:51 +02:00
Florian Westphal
89258f8e41 netfilter: nft_set_pipapo_avx2: fix up description warnings
W=1:
net/netfilter/nft_set_pipapo_avx2.c:159: warning: Excess function parameter 'len' description in 'nft_pipapo_avx2_refill'
net/netfilter/nft_set_pipapo_avx2.c:1124: warning: Function parameter or member 'key' not described in 'nft_pipapo_avx2_lookup'
net/netfilter/nft_set_pipapo_avx2.c:1124: warning: Excess function parameter 'elem' description in 'nft_pipapo_avx2_lookup'

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-01 23:53:51 +02:00
Florian Westphal
897389de48 netfilter: nf_tables: remove xt_action_param from nft_pktinfo
Init it on demand in the nft_compat expression.  This reduces size
of nft_pktinfo from 48 to 24 bytes on x86_64.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 22:15:59 +02:00
Florian Westphal
f06ad944b6 netfilter: nf_tables: remove unused arg in nft_set_pktinfo_unspec()
The functions pass extra skb arg, but either its not used or the helpers
can already access it via pkt->skb.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:54 +02:00
Florian Westphal
2d7b4ace07 netfilter: nf_tables: add and use nft_thoff helper
This allows to change storage placement later on without changing readers.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:54 +02:00
Florian Westphal
85554eb981 netfilter: nf_tables: add and use nft_sk helper
This allows to change storage placement later on without changing readers.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:53 +02:00
Pablo Neira Ayuso
e0241ae6ac netfilter: use nfnetlink_unicast()
Replace netlink_unicast() calls by nfnetlink_unicast() which already
deals with translating EAGAIN to ENOBUFS as the nfnetlink core expects.

nfnetlink_unicast() calls nlmsg_unicast() which returns zero in case of
success, otherwise the netlink core function netlink_rcv_skb() turns
err > 0 into an acknowlegment.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:53 +02:00
Yang Li
02d8514267 netfilter: xt_CT: Remove redundant assignment to ret
Variable 'ret' is set to zero but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed

Clean up the following clang-analyzer warning:

net/netfilter/xt_CT.c:175:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:53 +02:00
Jason Baron
07df3fc90a netfilter: x_tables: improve limit_mt scalability
We've seen this spin_lock show up high in profiles. Let's introduce a
lockless version. I've tested this using pktgen_sample01_simple.sh.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:52 +02:00
Juerg Haefliger
06f0299302 netfilter: Remove leading spaces in Kconfig
Remove leading spaces before tabs in Kconfig file(s) by running the
following command:

  $ find net/netfilter -name 'Kconfig*' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:52 +02:00
Florian Westphal
f227925e53 netfilter: nf_tables: prefer direct calls for set lookups
Extend nft_set_do_lookup() to use direct calls when retpoline feature
is enabled.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29 01:04:27 +02:00
Florian Westphal
0974cff3eb netfilter: add and use nft_set_do_lookup helper
Followup patch will add a CONFIG_RETPOLINE wrapper to avoid
the ops->lookup() indirection cost for retpoline builds.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28 21:11:41 +02:00
Stefano Brivio
a58db7ad80 netfilter: nft_set_pipapo_avx2: Skip LDMXCSR, we don't need a valid MXCSR state
We don't need a valid MXCSR state for the lookup routines, none of
the instructions we use rely on or affect any bit in the MXCSR
register.

Instead of calling kernel_fpu_begin(), we can pass 0 as mask to
kernel_fpu_begin_mask() and spare one LDMXCSR instruction.

Commit 49200d17d2 ("x86/fpu/64: Don't FNINIT in kernel_fpu_begin()")
already speeds up lookups considerably, and by dropping the MCXSR
initialisation we can now get a much smaller, but measurable, increase
in matching rates.

The table below reports matching rates and a wild approximation of
clock cycles needed for a match in a "port,net" test with 10 entries
from selftests/netfilter/nft_concat_range.sh, limited to the first
field, i.e. the port (with nft_set_rbtree initialisation skipped), run
on a single AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$).

The (very rough) estimation of clock cycles is obtained by simply
dividing frequency by matching rate. The "cycles spared" column refers
to the difference in cycles compared to the previous row, and the rate
increase also refers to the previous row. Results are averages of six
runs.

Merely for context, I'm also reporting packet rates obtained by
skipping kernel_fpu_begin() and kernel_fpu_end() altogether (which
shows a very limited impact now), as well as skipping the whole lookup
function, compared to simply counting and dropping all packets using
the netdev hook drop (see nft_concat_range.sh for details). This
workload also includes packet generation with pktgen and the receive
path of veth.

                                      |matching|  est.  | cycles |  rate  |
                                      |  rate  | cycles | spared |increase|
                                      | (Mpps) |        |        |        |
--------------------------------------|--------|--------|--------|--------|
FNINIT, LDMXCSR (before 49200d17d2) |  5.245 |    553 |      - |      - |
LDMXCSR only (with 49200d17d2)      |  6.347 |    457 |     96 |  21.0% |
Without LDMXCSR (this patch)          |  6.461 |    449 |      8 |   1.8% |
-------- for reference only: ---------|--------|--------|--------|--------|
Without kernel_fpu_begin()            |  6.513 |    445 |      4 |   0.8% |
Without actual matching (return true) |  7.649 |    379 |     66 |  17.4% |
Without lookup operation (netdev drop)| 10.320 |    281 |     98 |  34.9% |

The clock cycles spared by avoiding LDMXCSR appear to be in line with CPI
and latency indicated in the manuals of comparable architectures: Intel
Skylake (CPI: 1, latency: 7) and AMD 12h (latency: 12) -- I couldn't find
this information for AMD 17h.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28 21:11:41 +02:00
Phil Sutter
133dc203d7 netfilter: nft_exthdr: Support SCTP chunks
Chunks are SCTP header extensions similar in implementation to IPv6
extension headers or TCP options. Reusing exthdr expression to find and
extract field values from them is therefore pretty straightforward.

For now, this supports extracting data from chunks at a fixed offset
(and length) only - chunks themselves are an extensible data structure;
in order to make all fields available, a nested extension search is
needed.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28 21:11:33 +02:00
Julian Anastasov
56e4ee82e8 ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
syzbot reported memory leak [1] when adding service with
HASHED flag. We should ignore this flag both from sockopt
and netlink provided data, otherwise the service is not
hashed and not visible while releasing resources.

[1]
BUG: memory leak
unreferenced object 0xffff888115227800 (size 512):
  comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline]
    [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline]
    [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343
    [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570
    [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101
    [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435
    [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857
    [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117
    [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline]
    [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline]
    [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125
    [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
    [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-and-tested-by: syzbot+e562383183e4b1766930@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-27 13:06:48 +02:00
Pablo Neira Ayuso
179d9ba555 netfilter: nf_tables: fix table flag updates
The dormant flag need to be updated from the preparation phase,
otherwise, two consecutive requests to dorm a table in the same batch
might try to remove the same hooks twice, resulting in the following
warning:

 hook not found, pf 3 num 0
 WARNING: CPU: 0 PID: 334 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
 Modules linked in:
 CPU: 0 PID: 334 Comm: kworker/u4:5 Not tainted 5.12.0-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Workqueue: netns cleanup_net
 RIP: 0010:__nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480

This patch is a partial revert of 0ce7cf4127 ("netfilter: nftables:
update table flags from the commit phase") to restore the previous
behaviour.

However, there is still another problem: A batch containing a series of
dorm-wakeup-dorm table and vice-versa also trigger the warning above
since hook unregistration happens from the preparation phase, while hook
registration occurs from the commit phase.

To fix this problem, this patch adds two internal flags to annotate the
original dormant flag status which are __NFT_TABLE_F_WAS_DORMANT and
__NFT_TABLE_F_WAS_AWAKEN, to restore it from the abort path.

The __NFT_TABLE_F_UPDATE bitmask allows to handle the dormant flag update
with one single transaction.

Reported-by: syzbot+7ad5cd1615f2d89c6e7e@syzkaller.appspotmail.com
Fixes: 0ce7cf4127 ("netfilter: nftables: update table flags from the commit phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-24 17:49:57 +02:00
Pablo Neira Ayuso
983c4fcb81 netfilter: nf_tables: extended netlink error reporting for chain type
Users that forget to select the NAT chain type in netfilter's Kconfig
hit ENOENT when adding the basechain.

This report is however sparse since it might be the table, the chain
or the kernel module that is missing/does not exist.

This patch provides extended netlink error reporting for the
NFTA_CHAIN_TYPE netlink attribute, which conveys the basechain type.
If the user selects a basechain that his custom kernel does not support,
the netlink extended error provides a more accurate hint on the
described issue.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-21 00:51:24 +02:00
Pablo Neira Ayuso
c781471d67 netfilter: nf_tables: missing error reporting for not selected expressions
Sometimes users forget to turn on nftables extensions from Kconfig that
they need. In such case, the error reporting from userspace is
misleading:

 $ sudo nft add rule x y counter
 Error: Could not process rule: No such file or directory
 add rule x y counter
 ^^^^^^^^^^^^^^^^^^^^

Add missing NL_SET_BAD_ATTR() to provide a hint:

 $ nft add rule x y counter
 Error: Could not process rule: No such file or directory
 add rule x y counter
              ^^^^^^^

Fixes: 83d9dcba06 ("netfilter: nf_tables: extended netlink error reporting for expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-21 00:51:24 +02:00
Florian Westphal
22cbdbcfb6 netfilter: conntrack: unregister ipv4 sockopts on error unwind
When ipv6 sockopt register fails, the ipv4 one needs to be removed.

Fixes: a0ae2562c6 ("netfilter: conntrack: remove l3proto abstraction")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-20 00:01:11 +02:00
Stefano Brivio
f0b3d33806 netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version
Arturo reported this backtrace:

[709732.358791] WARNING: CPU: 3 PID: 456 at arch/x86/kernel/fpu/core.c:128 kernel_fpu_begin_mask+0xae/0xe0
[709732.358793] Modules linked in: binfmt_misc nft_nat nft_chain_nat nf_nat nft_counter nft_ct nf_tables nf_conntrack_netlink nfnetlink 8021q garp stp mrp llc vrf intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul mgag200 ghash_clmulni_intel drm_kms_helper cec aesni_intel drm libaes crypto_simd cryptd glue_helper mei_me dell_smbios iTCO_wdt evdev intel_pmc_bxt iTCO_vendor_support dcdbas pcspkr rapl dell_wmi_descriptor wmi_bmof sg i2c_algo_bit watchdog mei acpi_ipmi ipmi_si button nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_devintf ipmi_msghandler ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor sd_mod t10_pi crc_t10dif crct10dif_generic raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ahci libahci tg3 libata xhci_pci libphy xhci_hcd ptp usbcore crct10dif_pclmul crct10dif_common bnxt_en crc32c_intel scsi_mod
[709732.358941]  pps_core i2c_i801 lpc_ich i2c_smbus wmi usb_common
[709732.358957] CPU: 3 PID: 456 Comm: jbd2/dm-0-8 Not tainted 5.10.0-0.bpo.5-amd64 #1 Debian 5.10.24-1~bpo10+1
[709732.358959] Hardware name: Dell Inc. PowerEdge R440/04JN2K, BIOS 2.9.3 09/23/2020
[709732.358964] RIP: 0010:kernel_fpu_begin_mask+0xae/0xe0
[709732.358969] Code: ae 54 24 04 83 e3 01 75 38 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75 33 48 83 c4 10 5b c3 65 8a 05 5e 21 5e 76 84 c0 74 92 <0f> 0b eb 8e f0 80 4f 01 40 48 81 c7 00 14 00 00 e8 dd fb ff ff eb
[709732.358972] RSP: 0018:ffffbb9700304740 EFLAGS: 00010202
[709732.358976] RAX: 0000000000000001 RBX: 0000000000000003 RCX: 0000000000000001
[709732.358979] RDX: ffffbb9700304970 RSI: ffff922fe1952e00 RDI: 0000000000000003
[709732.358981] RBP: ffffbb9700304970 R08: ffff922fc868a600 R09: ffff922fc711e462
[709732.358984] R10: 000000000000005f R11: ffff922ff0b27180 R12: ffffbb9700304960
[709732.358987] R13: ffffbb9700304b08 R14: ffff922fc664b6c8 R15: ffff922fc664b660
[709732.358990] FS:  0000000000000000(0000) GS:ffff92371fec0000(0000) knlGS:0000000000000000
[709732.358993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[709732.358996] CR2: 0000557a6655bdd0 CR3: 000000026020a001 CR4: 00000000007706e0
[709732.358999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[709732.359001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[709732.359003] PKRU: 55555554
[709732.359005] Call Trace:
[709732.359009]  <IRQ>
[709732.359035]  nft_pipapo_avx2_lookup+0x4c/0x1cba [nf_tables]
[709732.359046]  ? sched_clock+0x5/0x10
[709732.359054]  ? sched_clock_cpu+0xc/0xb0
[709732.359061]  ? record_times+0x16/0x80
[709732.359068]  ? plist_add+0xc1/0x100
[709732.359073]  ? psi_group_change+0x47/0x230
[709732.359079]  ? skb_clone+0x4d/0xb0
[709732.359085]  ? enqueue_task_rt+0x22b/0x310
[709732.359098]  ? bnxt_start_xmit+0x1e8/0xaf0 [bnxt_en]
[709732.359102]  ? packet_rcv+0x40/0x4a0
[709732.359121]  nft_lookup_eval+0x59/0x160 [nf_tables]
[709732.359133]  nft_do_chain+0x350/0x500 [nf_tables]
[709732.359152]  ? nft_lookup_eval+0x59/0x160 [nf_tables]
[709732.359163]  ? nft_do_chain+0x364/0x500 [nf_tables]
[709732.359172]  ? fib4_rule_action+0x6d/0x80
[709732.359178]  ? fib_rules_lookup+0x107/0x250
[709732.359184]  nft_nat_do_chain+0x8a/0xf2 [nft_chain_nat]
[709732.359193]  nf_nat_inet_fn+0xea/0x210 [nf_nat]
[709732.359202]  nf_nat_ipv4_out+0x14/0xa0 [nf_nat]
[709732.359207]  nf_hook_slow+0x44/0xc0
[709732.359214]  ip_output+0xd2/0x100
[709732.359221]  ? __ip_finish_output+0x210/0x210
[709732.359226]  ip_forward+0x37d/0x4a0
[709732.359232]  ? ip4_key_hashfn+0xb0/0xb0
[709732.359238]  ip_sublist_rcv_finish+0x4f/0x60
[709732.359243]  ip_sublist_rcv+0x196/0x220
[709732.359250]  ? ip_rcv_finish_core.isra.22+0x400/0x400
[709732.359255]  ip_list_rcv+0x137/0x160
[709732.359264]  __netif_receive_skb_list_core+0x29b/0x2c0
[709732.359272]  netif_receive_skb_list_internal+0x1a6/0x2d0
[709732.359280]  gro_normal_list.part.156+0x19/0x40
[709732.359286]  napi_complete_done+0x67/0x170
[709732.359298]  bnxt_poll+0x105/0x190 [bnxt_en]
[709732.359304]  ? irqentry_exit+0x29/0x30
[709732.359309]  ? asm_common_interrupt+0x1e/0x40
[709732.359315]  net_rx_action+0x144/0x3c0
[709732.359322]  __do_softirq+0xd5/0x29c
[709732.359329]  asm_call_irq_on_stack+0xf/0x20
[709732.359332]  </IRQ>
[709732.359339]  do_softirq_own_stack+0x37/0x40
[709732.359346]  irq_exit_rcu+0x9d/0xa0
[709732.359353]  common_interrupt+0x78/0x130
[709732.359358]  asm_common_interrupt+0x1e/0x40
[709732.359366] RIP: 0010:crc_41+0x0/0x1e [crc32c_intel]
[709732.359370] Code: ff ff f2 4d 0f 38 f1 93 a8 fe ff ff f2 4c 0f 38 f1 81 b0 fe ff ff f2 4c 0f 38 f1 8a b0 fe ff ff f2 4d 0f 38 f1 93 b0 fe ff ff <f2> 4c 0f 38 f1 81 b8 fe ff ff f2 4c 0f 38 f1 8a b8 fe ff ff f2 4d
[709732.359373] RSP: 0018:ffffbb97008dfcd0 EFLAGS: 00000246
[709732.359377] RAX: 000000000000002a RBX: 0000000000000400 RCX: ffff922fc591dd50
[709732.359379] RDX: ffff922fc591dea0 RSI: 0000000000000a14 RDI: ffffffffc00dddc0
[709732.359382] RBP: 0000000000001000 R08: 000000000342d8c3 R09: 0000000000000000
[709732.359384] R10: 0000000000000000 R11: ffff922fc591dff0 R12: ffffbb97008dfe58
[709732.359386] R13: 000000000000000a R14: ffff922fd2b91e80 R15: ffff922fef83fe38
[709732.359395]  ? crc_43+0x1e/0x1e [crc32c_intel]
[709732.359403]  ? crc32c_pcl_intel_update+0x97/0xb0 [crc32c_intel]
[709732.359419]  ? jbd2_journal_commit_transaction+0xaec/0x1a30 [jbd2]
[709732.359425]  ? irq_exit_rcu+0x3e/0xa0
[709732.359447]  ? kjournald2+0xbd/0x270 [jbd2]
[709732.359454]  ? finish_wait+0x80/0x80
[709732.359470]  ? commit_timeout+0x10/0x10 [jbd2]
[709732.359476]  ? kthread+0x116/0x130
[709732.359481]  ? kthread_park+0x80/0x80
[709732.359488]  ? ret_from_fork+0x1f/0x30
[709732.359494] ---[ end trace 081a19978e5f09f5 ]---

that is, nft_pipapo_avx2_lookup() uses the FPU running from a softirq
that interrupted a kthread, also using the FPU.

That's exactly the reason why irq_fpu_usable() is there: use it, and
if we can't use the FPU, fall back to the non-AVX2 version of the
lookup operation, i.e. nft_pipapo_lookup().

Reported-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
Cc: <stable@vger.kernel.org> # 5.6.x
Fixes: 7400b06396 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-14 01:42:52 +02:00
Roi Dayan
c07531c01d netfilter: flowtable: Remove redundant hw refresh bit
Offloading conns could fail for multiple reasons and a hw refresh bit is
set to try to reoffload it in next sw packet.
But it could be in some cases and future points that the hw refresh bit
is not set but a refresh could succeed.
Remove the hw refresh bit and do offload refresh if requested.
There won't be a new work entry if a work is already pending
anyway as there is the hw pending bit.

Fixes: 8b3646d6e0 ("net/sched: act_ct: Support refreshing the flow table entries")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-14 01:34:26 +02:00
Eric Dumazet
6c8774a94e netfilter: nftables: avoid potential overflows on 32bit arches
User space could ask for very large hash tables, we need to make sure
our size computations wont overflow.

nf_tables_newset() needs to double check the u64 size
will fit into size_t field.

Fixes: 0ed6389c48 ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-07 10:01:39 +02:00
Eric Dumazet
a54754ec98 netfilter: nftables: avoid overflows in nft_hash_buckets()
Number of buckets being stored in 32bit variables, we have to
ensure that no overflows occur in nft_hash_buckets()

syzbot injected a size == 0x40000000 and reported:

UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 29539 Comm: syz-executor.4 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
 __roundup_pow_of_two include/linux/log2.h:57 [inline]
 nft_hash_buckets net/netfilter/nft_set_hash.c:411 [inline]
 nft_hash_estimate.cold+0x19/0x1e net/netfilter/nft_set_hash.c:652
 nft_select_set_ops net/netfilter/nf_tables_api.c:3586 [inline]
 nf_tables_newset+0xe62/0x3110 net/netfilter/nf_tables_api.c:4322
 nfnetlink_rcv_batch+0xa09/0x24b0 net/netfilter/nfnetlink.c:488
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:612 [inline]
 nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:630
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46

Fixes: 0ed6389c48 ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-07 10:01:29 +02:00
Pablo Neira Ayuso
85dfd816fa netfilter: nftables: Fix a memleak from userdata error path in new objects
Release object name if userdata allocation fails.

Fixes: b131c96496 ("netfilter: nf_tables: add userdata support for nft_object")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-05 23:52:52 +02:00
Pablo Neira Ayuso
198ad97383 netfilter: remove BUG_ON() after skb_header_pointer()
Several conntrack helpers and the TCP tracker assume that
skb_header_pointer() never fails based on upfront header validation.
Even if this should not ever happen, BUG_ON() is a too drastic measure,
remove them.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-05 23:45:48 +02:00
Pablo Neira Ayuso
5e024c3254 netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
Do not assume that the tcph->doff field is correct when parsing for TCP
options, skb_header_pointer() might fail to fetch these bits.

Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-05 22:26:09 +02:00
Eric Dumazet
7072a355ba netfilter: nfnetlink: add a missing rcu_read_unlock()
Reported by syzbot :
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:201
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 26899, name: syz-executor.5
1 lock held by syz-executor.5/26899:
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_get_subsys net/netfilter/nfnetlink.c:148 [inline]
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_rcv_msg+0x1da/0x1300 net/netfilter/nfnetlink.c:226
Preemption disabled at:
[<ffffffff8917799e>] preempt_schedule_irq+0x3e/0x90 kernel/sched/core.c:5533
CPU: 1 PID: 26899 Comm: syz-executor.5 Not tainted 5.12.0-next-20210504-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:8338
 might_alloc include/linux/sched/mm.h:201 [inline]
 slab_pre_alloc_hook mm/slab.h:500 [inline]
 slab_alloc_node mm/slub.c:2845 [inline]
 kmem_cache_alloc_node+0x33d/0x3e0 mm/slub.c:2960
 __alloc_skb+0x20b/0x340 net/core/skbuff.c:413
 alloc_skb include/linux/skbuff.h:1107 [inline]
 nlmsg_new include/net/netlink.h:953 [inline]
 netlink_ack+0x1ed/0xaa0 net/netlink/af_netlink.c:2437
 netlink_rcv_skb+0x33d/0x420 net/netlink/af_netlink.c:2508
 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:650
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa8a03ee188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9
RDX: 0000000000000000 RSI: 0000000020000480 RDI: 0000000000000004
RBP: 00000000004bfce1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007fffe864480f R14: 00007fa8a03ee300 R15: 0000000000022000

================================================
WARNING: lock held when returning to user space!
5.12.0-next-20210504-syzkaller #0 Tainted: G        W
------------------------------------------------
syz-executor.5/26899 is leaving the kernel with locks still held!
1 lock held by syz-executor.5/26899:
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_get_subsys net/netfilter/nfnetlink.c:148 [inline]
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_rcv_msg+0x1da/0x1300 net/netfilter/nfnetlink.c:226
------------[ cut here ]------------
WARNING: CPU: 0 PID: 26899 at kernel/rcu/tree_plugin.h:359 rcu_note_context_switch+0xfd/0x16e0 kernel/rcu/tree_plugin.h:359
Modules linked in:
CPU: 0 PID: 26899 Comm: syz-executor.5 Tainted: G        W         5.12.0-next-20210504-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rcu_note_context_switch+0xfd/0x16e0 kernel/rcu/tree_plugin.h:359
Code: 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 2e 0d 00 00 8b bd cc 03 00 00 85 ff 7e 02 <0f> 0b 65 48 8b 2c 25 00 f0 01 00 48 8d bd cc 03 00 00 48 b8 00 00
RSP: 0000:ffffc90002fffdb0 EFLAGS: 00010002
RAX: 0000000000000007 RBX: ffff8880b9c36080 RCX: ffffffff8dc99bac
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
RBP: ffff88808b9d1c80 R08: 0000000000000000 R09: ffffffff8dc96917
R10: fffffbfff1b92d22 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88808b9d1c80 R14: ffff88808b9d1c80 R15: ffffc90002ff8000
FS:  00007fa8a03ee700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f09896ed000 CR3: 0000000032070000 CR4: 00000000001526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __schedule+0x214/0x23e0 kernel/sched/core.c:5044
 schedule+0xcf/0x270 kernel/sched/core.c:5226
 exit_to_user_mode_loop kernel/entry/common.c:162 [inline]
 exit_to_user_mode_prepare+0x13e/0x280 kernel/entry/common.c:208
 irqentry_exit_to_user_mode+0x5/0x40 kernel/entry/common.c:314
 asm_sysvec_reschedule_ipi+0x12/0x20 arch/x86/include/asm/idtentry.h:637
RIP: 0033:0x4665f9

Fixes: 50f2db9e36 ("netfilter: nfnetlink: consolidate callback types")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-05 17:26:02 +02:00
Pablo Neira Ayuso
c7d13358b6 netfilter: xt_SECMARK: add new revision to fix structure layout
This extension breaks when trying to delete rules, add a new revision to
fix this.

Fixes: 5e6874cdb8 ("[SECMARK]: Add xtables SECMARK target")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-03 23:02:44 +02:00
Arnd Bergmann
7acc0bb490 netfilter: nft_socket: fix build with CONFIG_SOCK_CGROUP_DATA=n
In some configurations, the sock_cgroup_ptr() function is not available:

net/netfilter/nft_socket.c: In function 'nft_sock_get_eval_cgroupv2':
net/netfilter/nft_socket.c:47:16: error: implicit declaration of function 'sock_cgroup_ptr'; did you mean 'obj_cgroup_put'? [-Werror=implicit-function-declaration]
   47 |         cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
      |                ^~~~~~~~~~~~~~~
      |                obj_cgroup_put
net/netfilter/nft_socket.c:47:14: error: assignment to 'struct cgroup *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion]
   47 |         cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
      |              ^

Change the caller to match the same #ifdef check, only calling it
when the function is defined.

Fixes: e0bb96db96 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 22:34:05 +02:00
Arnd Bergmann
8a7363f849 netfilter: nft_socket: fix an unused variable warning
The variable is only used in an #ifdef, causing a harmless warning:

net/netfilter/nft_socket.c: In function 'nft_socket_init':
net/netfilter/nft_socket.c:137:27: error: unused variable 'level' [-Werror=unused-variable]
  137 |         unsigned int len, level;
      |                           ^~~~~

Move it into the same #ifdef block.

Fixes: e0bb96db96 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 22:34:00 +02:00
Pablo Neira Ayuso
aaa31047a6 netfilter: nftables: add catch-all set element support
This patch extends the set infrastructure to add a special catch-all set
element. If the lookup fails to find an element (or range) in the set,
then the catch-all element is selected. Users can specify a mapping,
expression(s) and timeout to be attached to the catch-all element.

This patch adds a catchall list to the set, this list might contain more
than one single catch-all element (e.g. in case that the catch-all
element is removed and a new one is added in the same transaction).
However, most of the time, there will be either one element or no
elements at all in this list.

The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and
such special element has no NFTA_SET_ELEM_KEY attribute. There is a new
nft_set_elem_catchall object that stores a reference to the dummy
catch-all element (catchall->elem) whose layout is the same of the set
element type to reuse the existing set element codebase.

The set size does not apply to the catch-all element, users can define a
catch-all element even if the set is full.

The check for valid set element flags hava been updates to report
EOPNOTSUPP in case userspace requests flags that are not supported when
using new userspace nftables and old kernel.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 18:06:29 +02:00
Pablo Neira Ayuso
97c976d662 netfilter: nftables: add helper function to validate set element data
When binding sets to rule, validate set element data according to
set definition. This patch adds a helper function to be reused by
the catch-all set element support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 18:05:48 +02:00
Pablo Neira Ayuso
e6ba7cb63b netfilter: nftables: add helper function to flush set elements
This patch adds nft_set_flush() which prepares for the catch-all
element support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 18:05:45 +02:00
Pablo Neira Ayuso
6387aa6e59 netfilter: nftables: add loop check helper function
This patch adds nft_check_loops() to reuse it in the new catch-all
element codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 18:05:41 +02:00
Pablo Neira Ayuso
f8bb7889af netfilter: nftables: rename set element data activation/deactivation functions
Rename:

- nft_set_elem_activate() to nft_set_elem_data_activate().
- nft_set_elem_deactivate() to nft_set_elem_data_deactivate().

To prepare for updates in the set element infrastructure to add support
for the special catch-all element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27 18:05:36 +02:00
Florian Westphal
47a6959fa3 netfilter: allow to turn off xtables compat layer
The compat layer needs to parse untrusted input (the ruleset)
to translate it to a 64bit compatible format.

We had a number of bugs in this department in the past, so allow users
to turn this feature off.

Add CONFIG_NETFILTER_XTABLES_COMPAT kconfig knob and make it default to y
to keep existing behaviour.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 18:16:56 +02:00
Pablo Neira Ayuso
50f2db9e36 netfilter: nfnetlink: consolidate callback types
Add enum nfnl_callback_type to identify the callback type to provide one
single callback.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 18:16:56 +02:00
Pablo Neira Ayuso
7dab8ee3b6 netfilter: nfnetlink: pass struct nfnl_info to batch callbacks
Update batch callbacks to use the nfnl_info structure. Rename one
clashing info variable to expr_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 18:16:56 +02:00
Pablo Neira Ayuso
797d49805d netfilter: nfnetlink: pass struct nfnl_info to rcu callbacks
Update rcu callbacks to use the nfnl_info structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 18:16:52 +02:00
Pablo Neira Ayuso
a655536571 netfilter: nfnetlink: add struct nfnl_info and pass it to callbacks
Add a new structure to reduce callback footprint and to facilite
extensions of the nfnetlink callback interface in the future.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:58:17 +02:00
Pablo Neira Ayuso
d59d2f82f9 netfilter: nftables: add nft_pernet() helper function
Consolidate call to net_generic(net, nf_tables_net_id) in this
wrapper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:58:17 +02:00
Phil Sutter
593268ddf3 netfilter: nf_log_syslog: Unset bridge logger in pernet exit
Without this, a stale pointer remains in pernet loggers after module
unload causing a kernel oops during dereference. Easily reproduced by:

| # modprobe nf_log_syslog
| # rmmod nf_log_syslog
| # cat /proc/net/netfilter/nf_log

Fixes: 77ccee96a6 ("netfilter: nf_log_bridge: merge with nf_log_syslog")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:20:47 +02:00
Florian Westphal
ae68933422 netfilter: ip_tables: pass table pointer via nf_hook_ops
iptable_x modules rely on 'struct net' to contain a pointer to the
table that should be evaluated.

In order to remove these pointers from struct net, pass them via
the 'priv' pointer in a similar fashion as nf_tables passes the
rule data.

To do that, duplicate the nf_hook_info array passed in from the
iptable_x modules, update the ops->priv pointers of the copy to
refer to the table and then change the hookfn implementations to
just pass the 'priv' argument to the traverser.

After this patch, the xt_table pointers can already be removed
from struct net.

However, changes to struct net result in re-compile of the entire
network stack, so do the removal after arptables and ip6tables
have been converted as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:20:46 +02:00
Florian Westphal
1ef4d6d1af netfilter: x_tables: add xt_find_table
This will be used to obtain the xt_table struct given address family and
table name.

Followup patches will reduce the number of direct accesses to the xt_table
structures via net->ipv{4,6}.ip(6)table_{nat,mangle,...} pointers, then
remove them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:20:39 +02:00
Florian Westphal
de8c12110a netfilter: disable defrag once its no longer needed
When I changed defrag hooks to no longer get registered by default I
intentionally made it so that registration can only be un-done by unloading
the nf_defrag_ipv4/6 module.

In hindsight this was too conservative; there is no reason to keep defrag
on while there is no feature dependency anymore.

Moreover, this won't work if user isn't allowed to remove nf_defrag module.

This adds the disable() functions for both ipv4 and ipv6 and calls them
from conntrack, TPROXY and the xtables socket module.

ipvs isn't converted here, it will behave as before this patch and
will need module removal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:20:07 +02:00
Pablo Neira Ayuso
e0bb96db96 netfilter: nft_socket: add support for cgroupsv2
Allow to match on the cgroupsv2 id from ancestor level.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:20:07 +02:00
Florian Westphal
885e8c6824 netfilter: nat: move nf_xfrm_me_harder to where it is used
remove the export and make it static.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26 03:20:07 +02:00
David S. Miller
6dd06ec7c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add vlan match and pop actions to the flowtable offload,
   patches from wenxu.

2) Reduce size of the netns_ct structure, which itself is
   embedded in struct net Make netns_ct a read-mostly structure.
   Patches from Florian Westphal.

3) Add FLOW_OFFLOAD_XMIT_UNSPEC to skip dst check from garbage
   collector path, as required by the tc CT action. From Roi Dayan.

4) VLAN offload fixes for nftables: Allow for matching on both s-vlan
   and c-vlan selectors. Fix match of VLAN id due to incorrect
   byteorder. Add a new routine to properly populate flow dissector
   ethertypes.

5) Missing keys in ip{6}_route_me_harder() results in incorrect
   routes. This includes an update for selftest infra. Patches
   from Ido Schimmel.

6) Add counter hardware offload support through FLOW_CLS_STATS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19 15:49:50 -07:00
Pablo Neira Ayuso
b72920f6e4 netfilter: nftables: counter hardware offload support
This patch adds the .offload_stats operation to synchronize hardware
stats with the expression data. Update the counter expression to use
this new interface. The hardware stats are retrieved from the netlink
dump path via FLOW_CLS_STATS command to the driver.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-18 22:04:49 +02:00
Pablo Neira Ayuso
783003f3bb netfilter: nftables_offload: special ethertype handling for VLAN
The nftables offload parser sets FLOW_DISSECTOR_KEY_BASIC .n_proto to the
ethertype field in the ethertype frame. However:

- FLOW_DISSECTOR_KEY_BASIC .n_proto field always stores either IPv4 or IPv6
  ethertypes.
- FLOW_DISSECTOR_KEY_VLAN .vlan_tpid stores either the 802.1q and 802.1ad
  ethertypes. Same as for FLOW_DISSECTOR_KEY_CVLAN.

This function adjusts the flow dissector to handle two scenarios:

1) FLOW_DISSECTOR_KEY_VLAN .vlan_tpid is set to 802.1q or 802.1ad.
   Then, transfer:
   - the .n_proto field to FLOW_DISSECTOR_KEY_VLAN .tpid.
   - the original FLOW_DISSECTOR_KEY_VLAN .tpid to the
     FLOW_DISSECTOR_KEY_CVLAN .tpid
   - the original FLOW_DISSECTOR_KEY_CVLAN .tpid to the .n_proto field.

2) .n_proto is set to 802.1q or 802.1ad. Then, transfer:
   - the .n_proto field to FLOW_DISSECTOR_KEY_VLAN .tpid.
   - the original FLOW_DISSECTOR_KEY_VLAN .tpid to the .n_proto field.

Fixes: a82055af59 ("netfilter: nft_payload: add VLAN offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-18 22:02:21 +02:00
Pablo Neira Ayuso
ff4d90a89d netfilter: nftables_offload: VLAN id needs host byteorder in flow dissector
The flow dissector representation expects the VLAN id in host byteorder.
Add the NFT_OFFLOAD_F_NETWORK2HOST flag to swap the bytes from nft_cmp.

Fixes: a82055af59 ("netfilter: nft_payload: add VLAN offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-18 22:02:21 +02:00
Pablo Neira Ayuso
14c20643ef netfilter: nft_payload: fix C-VLAN offload support
- add another struct flow_dissector_key_vlan for C-VLAN
- update layer 3 dependency to allow to match on IPv4/IPv6

Fixes: 89d8fd44ab ("netfilter: nft_payload: add C-VLAN offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-18 22:02:21 +02:00
Jakub Kicinski
8203c7ce4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
 - keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
 - fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-17 11:08:07 -07:00
Roi Dayan
78ed0a9bc6 netfilter: flowtable: Add FLOW_OFFLOAD_XMIT_UNSPEC xmit type
It could be xmit type was not set and would default to FLOW_OFFLOAD_XMIT_NEIGH
and in this type the gc expect to have a route info.
Fix that by adding FLOW_OFFLOAD_XMIT_UNSPEC which defaults to 0.

Fixes: 8b9229d158 ("netfilter: flowtable: dst_check() from garbage collector path")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:47:00 +02:00
Florian Westphal
9b1a4d0f91 netfilter: conntrack: convert sysctls to u8
log_invalid sysctl allows values of 0 to 255 inclusive so we no longer
need a range check: the min/max values can be removed.

This also removes all member variables that were moved to net_generic
data in previous patches.

This reduces size of netns_ct struct by one cache line.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:10:39 +02:00
Florian Westphal
c53bd0e966 netfilter: conntrack: move ct counter to net_generic data
Its only needed from slowpath (sysctl, ctnetlink, gc worker) and
when a new conntrack object is allocated.

Furthermore, each write dirties the otherwise read-mostly pernet
data in struct net.ct, which are accessed from packet path.

Move it to the net_generic data.  This makes struct netns_ct
read-mostly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:10:39 +02:00
Florian Westphal
f6f2e580d5 netfilter: conntrack: move expect counter to net_generic data
Creation of a new conntrack entry isn't a frequent operation (compared
to 'ct entry already exists').  Creation of a new entry that is also an
expected (related) connection even less so.

Place this counter in net_generic data.

A followup patch will also move the conntrack count -- this will make
netns_ct a read-mostly structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:10:39 +02:00
Florian Westphal
67f28216ca netfilter: conntrack: move autoassign_helper sysctl to net_generic data
While at it, make it an u8, no need to use an integer for a boolean.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:10:39 +02:00
Florian Westphal
098b5d3565 netfilter: conntrack: move autoassign warning member to net_generic data
Not accessed in fast path, place this is generic_net data instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:10:39 +02:00
wenxu
efce49dfe6 netfilter: flowtable: add vlan pop action offload support
This patch adds vlan pop action offload in the flowtable offload.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 13:09:41 +02:00
wenxu
3e1b0c168f netfilter: flowtable: add vlan match offload support
This patch adds support for vlan_id, vlan_priority and vlan_proto match
for flowtable offload.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 12:59:54 +02:00
Pablo Neira Ayuso
4d8f906583 netfilter: nftables: clone set element expression template
memcpy() breaks when using connlimit in set elements. Use
nft_expr_clone() to initialize the connlimit expression list, otherwise
connlimit garbage collector crashes when walking on the list head copy.

[  493.064656] Workqueue: events_power_efficient nft_rhash_gc [nf_tables]
[  493.064685] RIP: 0010:find_or_evict+0x5a/0x90 [nf_conncount]
[  493.064694] Code: 2b 43 40 83 f8 01 77 0d 48 c7 c0 f5 ff ff ff 44 39 63 3c 75 df 83 6d 18 01 48 8b 43 08 48 89 de 48 8b 13 48 8b 3d ee 2f 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83
[  493.064699] RSP: 0018:ffffc90000417dc0 EFLAGS: 00010297
[  493.064704] RAX: 0000000000000000 RBX: ffff888134f38410 RCX: 0000000000000000
[  493.064708] RDX: 0000000000000000 RSI: ffff888134f38410 RDI: ffff888100060cc0
[  493.064711] RBP: ffff88812ce594a8 R08: ffff888134f38438 R09: 00000000ebb9025c
[  493.064714] R10: ffffffff8219f838 R11: 0000000000000017 R12: 0000000000000001
[  493.064718] R13: ffffffff82146740 R14: ffff888134f38410 R15: 0000000000000000
[  493.064721] FS:  0000000000000000(0000) GS:ffff88840e440000(0000) knlGS:0000000000000000
[  493.064725] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  493.064729] CR2: 0000000000000008 CR3: 00000001330aa002 CR4: 00000000001706e0
[  493.064733] Call Trace:
[  493.064737]  nf_conncount_gc_list+0x8f/0x150 [nf_conncount]
[  493.064746]  nft_rhash_gc+0x106/0x390 [nf_tables]

Reported-by: Laura Garcia Liebana <nevola@gmail.com>
Fixes: 4094445229 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 00:19:05 +02:00
Florian Westphal
b29c457a65 netfilter: x_tables: fix compat match/target pad out-of-bound write
xt_compat_match/target_from_user doesn't check that zeroing the area
to start of next rule won't write past end of allocated ruleset blob.

Remove this code and zero the entire blob beforehand.

Reported-by: syzbot+cfc0247ac173f597aaaa@syzkaller.appspotmail.com
Reported-by: Andy Nguyen <theflow@google.com>
Fixes: 9fa492cdc1 ("[NETFILTER]: x_tables: simplify compat API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13 00:18:57 +02:00
Jonathon Reinhart
2671fa4dc0 netfilter: conntrack: Make global sysctls readonly in non-init netns
These sysctls point to global variables:
- NF_SYSCTL_CT_MAX (&nf_conntrack_max)
- NF_SYSCTL_CT_EXPECT_MAX (&nf_ct_expect_max)
- NF_SYSCTL_CT_BUCKETS (&nf_conntrack_htable_size_user)

Because their data pointers are not updated to point to per-netns
structures, they must be marked read-only in a non-init_net ns.
Otherwise, changes in any net namespace are reflected in (leaked into)
all other net namespaces. This problem has existed since the
introduction of net namespaces.

The current logic marks them read-only only if the net namespace is
owned by an unprivileged user (other than init_user_ns).

Commit d0febd81ae ("netfilter: conntrack: re-visit sysctls in
unprivileged namespaces") "exposes all sysctls even if the namespace is
unpriviliged." Since we need to mark them readonly in any case, we can
forego the unprivileged user check altogether.

Fixes: d0febd81ae ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces")
Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:27:11 -07:00
Eric Dumazet
b895bdf5d6 netfilter: nft_limit: avoid possible divide error in nft_limit_init
div_u64() divides u64 by u32.

nft_limit_init() wants to divide u64 by u64, use the appropriate
math function (div64_u64)

divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8390 Comm: syz-executor188 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:div_u64_rem include/linux/math64.h:28 [inline]
RIP: 0010:div_u64 include/linux/math64.h:127 [inline]
RIP: 0010:nft_limit_init+0x2a2/0x5e0 net/netfilter/nft_limit.c:85
Code: ef 4c 01 eb 41 0f 92 c7 48 89 de e8 38 a5 22 fa 4d 85 ff 0f 85 97 02 00 00 e8 ea 9e 22 fa 4c 0f af f3 45 89 ed 31 d2 4c 89 f0 <49> f7 f5 49 89 c6 e8 d3 9e 22 fa 48 8d 7d 48 48 b8 00 00 00 00 00
RSP: 0018:ffffc90009447198 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000200000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff875152e6 RDI: 0000000000000003
RBP: ffff888020f80908 R08: 0000200000000000 R09: 0000000000000000
R10: ffffffff875152d8 R11: 0000000000000000 R12: ffffc90009447270
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  000000000097a300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200001c4 CR3: 0000000026a52000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 nf_tables_newexpr net/netfilter/nf_tables_api.c:2675 [inline]
 nft_expr_init+0x145/0x2d0 net/netfilter/nf_tables_api.c:2713
 nft_set_elem_expr_alloc+0x27/0x280 net/netfilter/nf_tables_api.c:5160
 nf_tables_newset+0x1997/0x3150 net/netfilter/nf_tables_api.c:4321
 nfnetlink_rcv_batch+0x85a/0x21b0 net/netfilter/nfnetlink.c:456
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:580 [inline]
 nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:598
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c26844eda9 ("netfilter: nf_tables: Fix nft limit burst handling")
Fixes: 3e0f64b7dd ("netfilter: nft_limit: fix packet ratelimiting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Luigi Rizzo <lrizzo@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-10 21:15:35 +02:00
Florian Westphal
1379940bf8 netfilter: conntrack: move ecache dwork to net_generic infra
dwork struct is large (>128 byte) and not needed when conntrack module
is not loaded.

Place it in net_generic data instead.  The struct net dwork member is now
obsolete and will be removed in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:53 +02:00
Florian Westphal
7b5974709f netfilter: conntrack: move sysctl pointer to net_generic infra
No need to keep this in struct net, place it in the net_generic data.
The sysctl pointer is removed from struct net in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:53 +02:00
Florian Westphal
1d610d4d31 netfilter: x_tables: move known table lists to net_generic infra
Will reduce struct net size by 208 bytes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:52 +02:00
Florian Westphal
0854db2aae netfilter: nf_tables: use net_generic infra for transaction data
This moves all nf_tables pernet data from struct net to a net_generic
extension, with the exception of the gencursor.

The latter is used in the data path and also outside of the nf_tables
core. All others are only used from the configuration plane.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:52 +02:00
Florian Westphal
ebfbe67568 netfilter: cttimeout: use net_generic infra
reduce size of struct net and make this self-contained.
The member in struct net is kept to minimize changes to struct net
layout, it will be removed in a separate patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Florian Westphal
1be05ea766 netfilter: nfnetlink: use net_generic infra
No need to place it in struct net, nfnetlink is a module and usage
doesn't occur in fastpath.

Also remove rcu usage:

Not a single reader of net->nfnl uses rcu accessors.

When exit_batch callbacks are executed the net namespace is already dead
so no calls to these functions are possible anymore (else we'd get NULL
deref crash too).

If the module is removed, then modules that call any of those functions
have been removed too so no calls to nfnl functions are possible either.

The nfnl and nfl_stash pointers in struct net are no longer used, they
will be removed in a followup patch to minimize changes to struct net
(causes rebuild for entire network stack).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Florian Westphal
237c609f87 netfilter: nfnetlink: add and use nfnetlink_broadcast
This removes the only reference of net->nfnl outside of the nfnetlink
module.  This allows to move net->nfnl to net_generic infra.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-06 00:34:51 +02:00
Pablo Neira Ayuso
8c56049fec netfilter: nftables: remove documentation on static functions
Since 4f16d25c68 ("netfilter: nftables: add nft_parse_register_load()
and use it") and 345023b0db ("netfilter: nftables: add
nft_parse_register_store() and use it"), the following functions are not
exported symbols anymore:

- nft_parse_register()
- nft_validate_register_load()
- nft_validate_register_store()

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-03 20:36:03 +02:00
Dan Carpenter
dadf33c9f6 netfilter: nftables: fix a warning message in nf_tables_commit_audit_collect()
The first argument of a WARN_ONCE() is a condition.  This WARN_ONCE()
will only print the table name, and is potentially problematic if the
table name has a %s in it.

Fixes: c520292f29 ("audit: log nftables configuration change events once per table")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-03 20:19:37 +02:00
Florian Westphal
daf47a7c10 netfilter: ipvs: do not printk on netns creation
This causes dmesg spew during normal operation, so remove this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-03 20:17:11 +02:00
Pablo Neira Ayuso
19c28b1374 netfilter: add helper function to set up the nfnetlink header and use it
This patch adds a helper function to set up the netlink and nfnetlink headers.
Update existing codebase to use it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Pablo Neira Ayuso
802b805162 netfilter: nftables: add helper function to set the base sequence number
This patch adds a helper function to calculate the base sequence number
field that is stored in the nfnetlink header. Use the helper function
whenever possible.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Yang Yingliang
7726c9ce71 netfilter: nftables: remove unnecessary spin_lock_init()
The spinlock nf_tables_destroy_list_lock is initialized statically.
It is unnecessary to initialize by spin_lock_init().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Pablo Neira Ayuso
8b9229d158 netfilter: flowtable: dst_check() from garbage collector path
Move dst_check() to the garbage collector path. Stale routes trigger the
flow entry teardown state which makes affected flows go back to the
classic forwarding path to re-evaluate flow offloading.

IPv6 requires the dst cookie to work, store it in the flow_tuple,
otherwise dst_check() always fails.

Fixes: e5075c0bad ("netfilter: flowtable: call dst_check() to fall back to classic forwarding")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Richard Guy Briggs
c520292f29 audit: log nftables configuration change events once per table
Reduce logging of nftables events to a level similar to iptables.
Restore the table field to list the table, adding the generation.

Indicate the op as the most significant operation in the event.

A couple of sample events:

type=PROCTITLE msg=audit(2021-03-18 09:30:49.801:143) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid
type=SYSCALL msg=audit(2021-03-18 09:30:49.801:143) : arch=x86_64 syscall=sendmsg success=yes exit=172 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=roo
t sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null)
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv6 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv4 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=inet entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld

type=PROCTITLE msg=audit(2021-03-18 09:30:49.839:144) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid
type=SYSCALL msg=audit(2021-03-18 09:30:49.839:144) : arch=x86_64 syscall=sendmsg success=yes exit=22792 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=r
oot sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null)
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv6 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv4 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=inet entries=165 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld

The issue was originally documented in
https://github.com/linux-audit/audit-kernel/issues/124

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Florian Westphal
cefa31a9d4 netfilter: nft_log: perform module load from nf_tables
modprobe calls from the nf_logger_find_get() API causes deadlock in very
special cases because they occur with the nf_tables transaction mutex held.

In the specific case of nf_log, deadlock is via:

 A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \
	    -> pernet_ops rwsem -> wait for C
 B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A
 C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B

Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load
the backend module during a transaction.

For nft_log we would have to add a softdep for both nfnetlink_log or
nf_log_syslog, since we do not know in advance which of the two backends
are going to be configured.

This defers the modprobe op until after the transaction mutex is released.

Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:11 +02:00
Florian Westphal
a38b5b56d6 netfilter: nf_log: add module softdeps
xt_LOG has no direct dependency on the syslog-based logger, it relies
on the nf_log core to probe the requested backend.

Now that all syslog-based loggers reside in the same module, we can
just add a soft dependency on nf_log_syslog and let modprobe take
care of it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:10 +02:00
Florian Westphal
e465cccd0b netfilter: nf_log_common: merge with nf_log_syslog
Remove nf_log_common.  Now that all per-af modules have been merged
there is no longer a need to provide a helper module.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:10 +02:00
Florian Westphal
77ccee96a6 netfilter: nf_log_bridge: merge with nf_log_syslog
Provide bridge log support from nf_log_syslog.

After the merge there is no need to load the "real packet loggers",
all of them now reside in the same module.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 22:34:05 +02:00
Pablo Neira Ayuso
fbea31808c netfilter: conntrack: do not print icmpv6 as unknown via /proc
/proc/net/nf_conntrack shows icmpv6 as unknown.

Fixes: 09ec82f5af ("netfilter: conntrack: remove protocol name from l4proto struct")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 01:12:47 +02:00
Pablo Neira Ayuso
0e07e25b48 netfilter: flowtable: fix NAT IPv6 offload mangling
Fix out-of-bound access in the address array.

Fixes: 5c27d8d76c ("netfilter: nf_flow_table_offload: add IPv6 support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 01:12:47 +02:00
Florian Westphal
1510618e45 netfilter: nf_log_netdev: merge with nf_log_syslog
Provide netdev family support from the nf_log_syslog module.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 00:37:27 +02:00
Florian Westphal
f5466caab9 netfilter: nf_log_ipv6: merge with nf_log_syslog
This removes the nf_log_ipv6 module, the functionality is now
provided by nf_log_syslog.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 00:37:27 +02:00
Florian Westphal
f11d61e795 netfilter: nf_log_arp: merge with nf_log_syslog
similar to previous change: nf_log_syslog now covers ARP logging
as well, the old nf_log_arp module is removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 00:37:27 +02:00
Florian Westphal
db3187ae21 netfilter: nf_log_ipv4: rename to nf_log_syslog
Netfilter has multiple log modules:
 nf_log_arp
 nf_log_bridge
 nf_log_ipv4
 nf_log_ipv6
 nf_log_netdev
 nfnetlink_log
 nf_log_common

With the exception of nfnetlink_log (packet is sent to userspace for
dissection/logging), all of them log to the kernel ringbuffer.

This is the first part of a series to merge all modules except
nfnetlink_log into a single module: nf_log_syslog.

This allows to reduce code.  After the series, only two log modules remain:
nfnetlink_log and nf_log_syslog. The latter provides the same
functionality as the old per-af log modules.

This renames nf_log_ipv4 to nf_log_syslog.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31 00:37:27 +02:00
Bhaskar Chowdhury
bcae6d5faf netfilter: nf_conntrack_acct.c: A typo fix
s/Accouting/Accounting/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-28 17:31:14 -07:00
Bhaskar Chowdhury
f60d94f0d7 netfilter: ipvs: A spello fix
s/registerd/registered/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-28 17:31:13 -07:00
David S. Miller
efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Pablo Neira Ayuso
17e52c0aaa netfilter: flowtable: support for FLOW_ACTION_PPPOE_PUSH
Add a PPPoE push action if layer 2 protocol is ETH_P_PPP_SES to add
PPPoE flowtable hardware offload support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Felix Fietkau
26267bf9bb netfilter: flowtable: bridge vlan hardware offload and switchdev
The switch might have already added the VLAN tag through PVID hardware
offload. Keep this extra VLAN in the flowtable but skip it on egress.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
73f97025a9 netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled
If there is a forward path to reach an ethernet device and hardware
offload is enabled, then use the direct xmit path.

Moreover, store the real device in the direct xmit path info since
software datapath uses dev_hard_header() to push the layer encapsulation
headers while hardware offload refers to the real device.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
eeff3000f2 netfilter: flowtable: add offload support for xmit path types
When the flow tuple xmit_type is set to FLOW_OFFLOAD_XMIT_DIRECT, the
dst_cache pointer is not valid, and the h_source/h_dest/ifidx out fields
need to be used.

This patch also adds the FLOW_ACTION_VLAN_PUSH action to pass the VLAN
tag to the driver.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
a11e7973cf netfilter: flowtable: add dsa support
Replace the master ethernet device by the dsa slave port. Packets coming
in from the software ingress path use the dsa slave port as input
device.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
72efd585f7 netfilter: flowtable: add pppoe support
Add the PPPoE protocol and session id to the flow tuple using the encap
fields to uniquely identify flows from the receive path. For the
transmit path, dev_hard_header() on the vlan device push the headers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
e990cef651 netfilter: flowtable: add bridge vlan filtering support
Add the vlan tag based when PVID is set on.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
4cd91f7c29 netfilter: flowtable: add vlan support
Add the vlan id and protocol to the flow tuple to uniquely identify
flows from the receive path. For the transmit path, dev_hard_header() on
the vlan device push the headers. This patch includes support for two
vlan headers (QinQ) from the ingress path.

Add a generic encap field to the flowtable entry which stores the
protocol and the tag id. This allows to reuse these fields in the PPPoE
support coming in a later patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
7a27f6ab41 netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
The egress device in the tuple is obtained from route. Use
dev_fill_forward_path() instead to provide the real egress device for
this flow whenever this is available.

The new FLOW_OFFLOAD_XMIT_DIRECT type uses dev_queue_xmit() to transmit
ethernet frames. Cache the source and destination hardware address to
use dev_queue_xmit() to transfer packets.

The FLOW_OFFLOAD_XMIT_DIRECT replaces FLOW_OFFLOAD_XMIT_NEIGH if
dev_fill_forward_path() finds a direct transmit path.

In case of topology updates, if peer is moved to different bridge port,
the connection will time out, reconnect will result in a new entry with
the correct path. Snooping fdb updates would allow for cleaning up stale
flowtable entries.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
c63a7cc4d7 netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
Obtain the ingress device in the tuple from the route in the reply
direction. Use dev_fill_forward_path() instead to get the real ingress
device for this flow.

Fall back to use the ingress device that the IP forwarding route
provides if:

- dev_fill_forward_path() finds no real ingress device.
- the ingress device that is obtained is not part of the flowtable
  devices.
- this route has a xfrm policy.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
5139c0c007 netfilter: flowtable: add xmit path types
Add the xmit_type field that defines the two supported xmit paths in the
flowtable data plane, which are the neighbour and the xfrm xmit paths.
This patch prepares for new flowtable xmit path types to come.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 12:48:39 -07:00
Pablo Neira Ayuso
0ce7cf4127 netfilter: nftables: update table flags from the commit phase
Do not update table flags from the preparation phase. Store the flags
update into the transaction, then update the flags from the commit
phase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 01:35:39 +01:00
Pablo Neira Ayuso
86fe2c19ee netfilter: nftables: skip hook overlap logic if flowtable is stale
If the flowtable has been previously removed in this batch, skip the
hook overlap checks. This fixes spurious EEXIST errors when removing and
adding the flowtable in the same batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 01:08:54 +01:00
Pablo Neira Ayuso
1b9cd7690a netfilter: flowtable: refresh timeout after dst and writable checks
Refresh the timeout (and retry hardware offload) once the skbuff dst
is confirmed to be current and after the skbuff is made writable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso
e5075c0bad netfilter: flowtable: call dst_check() to fall back to classic forwarding
In case the route is stale, pass up the packet to the classic forwarding
path for re-evaluation and schedule this flow entry for removal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso
f4401262b9 netfilter: flowtable: fast NAT functions never fail
Simplify existing fast NAT routines by returning void. After the
skb_try_make_writable() call consolidation, these routines cannot ever
fail.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso
4f08f173d0 netfilter: flowtable: move FLOW_OFFLOAD_DIR_MAX away from enumeration
This allows to remove the default case which should not ever happen and
that was added to avoid gcc warnings on unhandled FLOW_OFFLOAD_DIR_MAX
enumeration case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso
2babb46c8c netfilter: flowtable: move skb_try_make_writable() before NAT in IPv4
For consistency with the IPv6 flowtable datapath and to make sure the
skbuff is writable right before the NAT header updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:44:00 +01:00
Pablo Neira Ayuso
2fc11745c3 netfilter: flowtable: consolidate skb_try_make_writable() call
Fetch the layer 4 header size to be mangled by NAT when building the
tuple, then use it to make writable the network and the transport
headers. After this update, the NAT routines now assumes that the skbuff
area is writable. Do the pointer refetch only after the single
skb_try_make_writable() call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:43:59 +01:00
Gustavo A. R. Silva
c2168e6bd7 netfilter: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:48 +01:00
Yinjun Zhang
740b486a8d netfilter: flowtable: Make sure GC works periodically in idle system
Currently flowtable's GC work is initialized as deferrable, which
means GC cannot work on time when system is idle. So the hardware
offloaded flow may be deleted for timeout, since its used time is
not timely updated.

Resolve it by initializing the GC work as delayed work instead of
deferrable.

Fixes: c29f74e0df ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:21 +01:00
Pablo Neira Ayuso
7b35582cd0 netfilter: nftables: allow to update flowtable flags
Honor flowtable flags from the control update path. Disallow disabling
to toggle hardware offload support though.

Fixes: 8bb69f3b29 ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:21 +01:00
Pablo Neira Ayuso
7e6136f1b7 netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags
Error was not set accordingly.

Fixes: 8bb69f3b29 ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:21 +01:00
Ludovic Senecaux
8b2030b430 netfilter: conntrack: Fix gre tunneling over ipv6
This fix permits gre connections to be tracked within ip6tables rules

Signed-off-by: Ludovic Senecaux <linuxludo@free.fr>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:32:06 +01:00
Oz Shlomo
2ed37183ab netfilter: flowtable: separate replace, destroy and stats to different workqueues
Currently the flow table offload replace, destroy and stats work items are
executed on a single workqueue. As such, DESTROY and STATS commands may
be backloged after a burst of REPLACE work items. This scenario can bloat
up memory and may cause active connections to age.

Instatiate add, del and stats workqueues to avoid backlogs of non-dependent
actions. Provide sysfs control over the workqueue attributes, allowing
userspace applications to control the workqueue cpumask.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-18 00:31:19 +01:00
Florian Westphal
b58f33d49e netfilter: ctnetlink: fix dump of the expect mask attribute
Before this change, the mask is never included in the netlink message, so
"conntrack -E expect" always prints 0.0.0.0.

In older kernels the l3num callback struct was passed as argument, based
on tuple->src.l3num. After the l3num indirection got removed, the call
chain is based on m.src.l3num, but this value is 0xffff.

Init l3num to the correct value.

Fixes: f957be9d34 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-15 18:42:01 +01:00
Mark Tomlinson
175e476b8c netfilter: x_tables: Use correct memory barriers.
When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.

Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in cc00bcaa58 (which is still present), while still
maintaining the same speed of replacing tables.

The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.

Fixes: 7f5c6d4f66 ("netfilter: get rid of atomic ops in fast path")
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-15 18:42:01 +01:00
Mark Tomlinson
d3d40f2374 Revert "netfilter: x_tables: Switch synchronization to RCU"
This reverts commit cc00bcaa58.

This (and the preceding) patch basically re-implemented the RCU
mechanisms of patch 784544739a. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.

Prior to using RCU a script calling "iptables" approx. 200 times was
taking 1.16s. With RCU this increased to 11.59s.

Revert these patches and fix the issue in a different way.

Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-15 18:42:01 +01:00
Pablo Neira Ayuso
bd1777b3a8 netfilter: nftables: bogus check for netlink portID with table owner
The existing branch checks for 0 != table->nlpid which always evaluates
true for tables that have an owner.

Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-04 04:02:54 +01:00
Pablo Neira Ayuso
2888b080d0 netfilter: nftables: fix possible double hook unregistration with table owner
Skip hook unregistration of owner tables from the netns exit path,
nft_rcv_nl_event() unregisters the table hooks before tearing down
the table content.

Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-04 03:58:55 +01:00