Commit graph

25 commits

Author SHA1 Message Date
Thadeu Lima de Souza Cascardo
caf3ef7468 netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[   23.095215] ==================================================================
[   23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[   23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[   23.096358]
[   23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[   23.096770] Call Trace:
[   23.096910]  <IRQ>
[   23.097030]  dump_stack_lvl+0x60/0xc0
[   23.097218]  print_report+0xcf/0x630
[   23.097388]  ? nft_byteorder_eval+0x13c/0x320
[   23.097577]  ? kasan_addr_to_slab+0xd/0xc0
[   23.097760]  ? nft_byteorder_eval+0x13c/0x320
[   23.097949]  kasan_report+0xc9/0x110
[   23.098106]  ? nft_byteorder_eval+0x13c/0x320
[   23.098298]  __asan_load2+0x83/0xd0
[   23.098453]  nft_byteorder_eval+0x13c/0x320
[   23.098659]  nft_do_chain+0x1c8/0xc50
[   23.098852]  ? __pfx_nft_do_chain+0x10/0x10
[   23.099078]  ? __kasan_check_read+0x11/0x20
[   23.099295]  ? __pfx___lock_acquire+0x10/0x10
[   23.099535]  ? __pfx___lock_acquire+0x10/0x10
[   23.099745]  ? __kasan_check_read+0x11/0x20
[   23.099929]  nft_do_chain_ipv4+0xfe/0x140
[   23.100105]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
[   23.100327]  ? lock_release+0x204/0x400
[   23.100515]  ? nf_hook.constprop.0+0x340/0x550
[   23.100779]  nf_hook_slow+0x6c/0x100
[   23.100977]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
[   23.101223]  nf_hook.constprop.0+0x334/0x550
[   23.101443]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[   23.101677]  ? __pfx_nf_hook.constprop.0+0x10/0x10
[   23.101882]  ? __pfx_ip_rcv_finish+0x10/0x10
[   23.102071]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[   23.102291]  ? rcu_read_lock_held+0x4b/0x70
[   23.102481]  ip_local_deliver+0xbb/0x110
[   23.102665]  ? __pfx_ip_rcv+0x10/0x10
[   23.102839]  ip_rcv+0x199/0x2a0
[   23.102980]  ? __pfx_ip_rcv+0x10/0x10
[   23.103140]  __netif_receive_skb_one_core+0x13e/0x150
[   23.103362]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[   23.103647]  ? mark_held_locks+0x48/0xa0
[   23.103819]  ? process_backlog+0x36c/0x380
[   23.103999]  __netif_receive_skb+0x23/0xc0
[   23.104179]  process_backlog+0x91/0x380
[   23.104350]  __napi_poll.constprop.0+0x66/0x360
[   23.104589]  ? net_rx_action+0x1cb/0x610
[   23.104811]  net_rx_action+0x33e/0x610
[   23.105024]  ? _raw_spin_unlock+0x23/0x50
[   23.105257]  ? __pfx_net_rx_action+0x10/0x10
[   23.105485]  ? mark_held_locks+0x48/0xa0
[   23.105741]  __do_softirq+0xfa/0x5ab
[   23.105956]  ? __dev_queue_xmit+0x765/0x1c00
[   23.106193]  do_softirq.part.0+0x49/0xc0
[   23.106423]  </IRQ>
[   23.106547]  <TASK>
[   23.106670]  __local_bh_enable_ip+0xf5/0x120
[   23.106903]  __dev_queue_xmit+0x789/0x1c00
[   23.107131]  ? __pfx___dev_queue_xmit+0x10/0x10
[   23.107381]  ? find_held_lock+0x8e/0xb0
[   23.107585]  ? lock_release+0x204/0x400
[   23.107798]  ? neigh_resolve_output+0x185/0x350
[   23.108049]  ? mark_held_locks+0x48/0xa0
[   23.108265]  ? neigh_resolve_output+0x185/0x350
[   23.108514]  neigh_resolve_output+0x246/0x350
[   23.108753]  ? neigh_resolve_output+0x246/0x350
[   23.109003]  ip_finish_output2+0x3c3/0x10b0
[   23.109250]  ? __pfx_ip_finish_output2+0x10/0x10
[   23.109510]  ? __pfx_nf_hook+0x10/0x10
[   23.109732]  __ip_finish_output+0x217/0x390
[   23.109978]  ip_finish_output+0x2f/0x130
[   23.110207]  ip_output+0xc9/0x170
[   23.110404]  ip_push_pending_frames+0x1a0/0x240
[   23.110652]  raw_sendmsg+0x102e/0x19e0
[   23.110871]  ? __pfx_raw_sendmsg+0x10/0x10
[   23.111093]  ? lock_release+0x204/0x400
[   23.111304]  ? __mod_lruvec_page_state+0x148/0x330
[   23.111567]  ? find_held_lock+0x8e/0xb0
[   23.111777]  ? find_held_lock+0x8e/0xb0
[   23.111993]  ? __rcu_read_unlock+0x7c/0x2f0
[   23.112225]  ? aa_sk_perm+0x18a/0x550
[   23.112431]  ? filemap_map_pages+0x4f1/0x900
[   23.112665]  ? __pfx_aa_sk_perm+0x10/0x10
[   23.112880]  ? find_held_lock+0x8e/0xb0
[   23.113098]  inet_sendmsg+0xa0/0xb0
[   23.113297]  ? inet_sendmsg+0xa0/0xb0
[   23.113500]  ? __pfx_inet_sendmsg+0x10/0x10
[   23.113727]  sock_sendmsg+0xf4/0x100
[   23.113924]  ? move_addr_to_kernel.part.0+0x4f/0xa0
[   23.114190]  __sys_sendto+0x1d4/0x290
[   23.114391]  ? __pfx___sys_sendto+0x10/0x10
[   23.114621]  ? __pfx_mark_lock.part.0+0x10/0x10
[   23.114869]  ? lock_release+0x204/0x400
[   23.115076]  ? find_held_lock+0x8e/0xb0
[   23.115287]  ? rcu_is_watching+0x23/0x60
[   23.115503]  ? __rseq_handle_notify_resume+0x6e2/0x860
[   23.115778]  ? __kasan_check_write+0x14/0x30
[   23.116008]  ? blkcg_maybe_throttle_current+0x8d/0x770
[   23.116285]  ? mark_held_locks+0x28/0xa0
[   23.116503]  ? do_syscall_64+0x37/0x90
[   23.116713]  __x64_sys_sendto+0x7f/0xb0
[   23.116924]  do_syscall_64+0x59/0x90
[   23.117123]  ? irqentry_exit_to_user_mode+0x25/0x30
[   23.117387]  ? irqentry_exit+0x77/0xb0
[   23.117593]  ? exc_page_fault+0x92/0x140
[   23.117806]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   23.118081] RIP: 0033:0x7f744aee2bba
[   23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[   23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[   23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[   23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[   23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[   23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[   23.121617]  </TASK>
[   23.121749]
[   23.121845] The buggy address belongs to the virtual mapping at
[   23.121845]  [ffffc90000000000, ffffc90000009000) created by:
[   23.121845]  irq_init_percpu_irqstack+0x1cf/0x270
[   23.122707]
[   23.122803] The buggy address belongs to the physical page:
[   23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[   23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[   23.123998] page_type: 0xffffffff()
[   23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[   23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   23.125023] page dumped because: kasan: bad access detected
[   23.125326]
[   23.125421] Memory state around the buggy address:
[   23.125682]  ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   23.126072]  ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[   23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[   23.126840]                                               ^
[   23.127138]  ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[   23.127522]  ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[   23.127906] ==================================================================
[   23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-06 00:53:14 +02:00
Florian Westphal
a412dbf40f netfilter: nf_tables: limit allowed range via nla_policy
These NLA_U32 types get stored in u8 fields, reject invalid values
instead of silently casting to u8.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-26 08:05:57 +02:00
Phil Sutter
7d34aa3e03 netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters
Add a 'reset' flag just like with nft_object_ops::dump. This will be
useful to reset "anonymous stateful objects", e.g. simple rule counters.

No functional change intended.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-15 10:46:34 +01:00
Florian Westphal
d86473bf2f netfilter: nf_tables: use the correct get/put helpers
Switch to be16/32 and u16/32 respectively.  No code changes here,
the functions do the same thing, this is just for sparse checkers' sake.

objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Pablo Neira Ayuso
34cc9e5288 netfilter: nf_tables: cancel tracking for clobbered destination registers
Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.

This patch adds three new helper functions:

- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.

Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.

This patch updates the following expressions:

- meta_bridge
- bitwise
- byteorder
- meta
- payload

to use these helper functions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-20 00:29:46 +01:00
Pablo Neira Ayuso
f459bfd4b9 netfilter: nft_byteorder: track register operations
Cancel tracking for byteorder operation, otherwise selector + byteorder
operation is incorrectly reduced if source and destination registers are
the same.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-27 00:07:24 +01:00
Pablo Neira Ayuso
345023b0db netfilter: nftables: add nft_parse_register_store() and use it
This new function combines the netlink register attribute parser
and the store validation function.

This update requires to replace:

        enum nft_registers      dreg:8;

in many of the expression private areas otherwise compiler complains
with:

        error: cannot take address of bit-field ‘dreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-27 23:16:02 +01:00
Pablo Neira Ayuso
4f16d25c68 netfilter: nftables: add nft_parse_register_load() and use it
This new function combines the netlink register attribute parser
and the load validation function.

This update requires to replace:

	enum nft_registers      sreg:8;

in many of the expression private areas otherwise compiler complains
with:

	error: cannot take address of bit-field ‘sreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-27 22:53:29 +01:00
Ander Juaristi
a1b840adaf netfilter: nf_tables: Introduce new 64-bit helper register functions
Introduce new helper functions to load/store 64-bit values onto/from
registers:

 - nft_reg_store64
 - nft_reg_load64

This commit also re-orders all these helpers from smallest to largest
target bit size.

Signed-off-by: Ander Juaristi <a@juaristi.eus>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-26 11:01:00 +02:00
Thomas Gleixner
d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Florian Westphal
10870dd89e netfilter: nf_tables: add direct calls for all builtin expressions
With CONFIG_RETPOLINE its faster to add an if (ptr == &foo_func)
check and and use direct calls for all the built-in expressions.

~15% improvement in pathological cases.

checkpatch doesn't like the X macro due to the embedded return statement,
but the macro has a very limited scope so I don't think its a problem.

I would like to avoid bugs of the form
  If (e->ops->eval == (unsigned long)nft_foo_eval)
	 nft_bar_eval();

and open-coded if ()/else if()/else cascade, thus the macro.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18 15:02:33 +01:00
Liping Zhang
4e24877e61 netfilter: nf_tables: simplify the basic expressions' init routine
Some basic expressions are built into nf_tables.ko, such as nft_cmp,
nft_lookup, nft_range and so on. But these basic expressions' init
routine is a little ugly, too many goto errX labels, and we forget
to call nft_range_module_exit in the exit routine, although it is
harmless.

Acctually, the init and exit routines of these basic expressions
are same, i.e. do nft_register_expr in the init routine and do
nft_unregister_expr in the exit routine.

So it's better to arrange them into an array and deal with them
together.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-09 23:42:23 +01:00
Laura Garcia Liebana
36b701fae1 netfilter: nf_tables: validate maximum value of u32 netlink attributes
Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.

This patch revisits 4da449ae1d ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").

Fixes: 96518518cc ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:29:02 +02:00
Florian Westphal
4b8c4eddfc netfilter: nft_byteorder: avoid unneeded le/be conversion steps
David points out that we to three le/be conversions instead
of just one.  Doesn't matter on x86_64 w. gcc, but other
architectures might be less lucky.

Since it also simplifies code just follow his advice.

Fixes: c0f3275f5cb ("nftables: byteorder: provide le/be 64 bit conversion helper")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-01-13 14:02:59 +01:00
Florian Westphal
ce1e7989d9 netfilter: nft_byteorder: provide 64bit le/be conversion
Needed to convert the (64bit) conntrack counters to BE ordering.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-01-08 13:32:11 +01:00
Patrick McHardy
49499c3e6e netfilter: nf_tables: switch registers to 32 bit addressing
Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.

The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.

To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.

Userspace fully passes the testsuite using both old and new register
addressing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:29 +02:00
Patrick McHardy
b1c96ed37c netfilter: nf_tables: add register parsing/dumping helpers
Add helper functions to parse and dump register values in netlink attributes.
These helpers will later be changed to take care of translation between the
old 128 bit and the new 32 bit register numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:28 +02:00
Patrick McHardy
fad136ea0d netfilter: nf_tables: convert expressions to u32 register pointers
Simple conversion to use u32 pointers to the beginning of the registers
to keep follow up patches smaller.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:25 +02:00
Patrick McHardy
a55e22e92f netfilter: nf_tables: get rid of NFT_REG_VERDICT usage
Replace the array of registers passed to expressions by a struct nft_regs,
containing the verdict as a seperate member, which aliases to the
NFT_REG_VERDICT register.

This is needed to seperate the verdict from the data registers completely,
so their size can be changed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:07 +02:00
Patrick McHardy
d07db9884a netfilter: nf_tables: introduce nft_validate_register_load()
Change nft_validate_input_register() to not only validate the input
register number, but also the length of the load, and rename it to
nft_validate_register_load() to reflect that change.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:50 +02:00
Patrick McHardy
27e6d2017a netfilter: nf_tables: kill nft_validate_output_register()
All users of nft_validate_register_store() first invoke
nft_validate_output_register(). There is in fact no use for using it
on its own, so simplify the code by folding the functionality into
nft_validate_register_store() and kill it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:50 +02:00
Patrick McHardy
1ec10212f9 netfilter: nf_tables: rename nft_validate_data_load()
The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:49 +02:00
Patrick McHardy
45d9bcda21 netfilter: nf_tables: validate len in nft_validate_data_load()
For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 16:25:49 +02:00
Patrick McHardy
ef1f7df917 netfilter: nf_tables: expression ops overloading
Split the expression ops into two parts and support overloading of
the runtime expression ops based on the requested function through
a ->select_ops() callback.

This can be used to provide optimized implementations, for instance
for loading small aligned amounts of data from the packet or inlining
frequently used operations into the main evaluation loop.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:16:08 +02:00
Patrick McHardy
96518518cc netfilter: add nftables
This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
  registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
  include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
  net/ipv4/netfilter/nf_tables_ipv4.c
  net/ipv6/netfilter/nf_tables_ipv6.c
  net/ipv4/netfilter/nf_tables_arp.c
  net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
  net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
  net/ipv4/netfilter/nf_table_route_ipv4.c
  net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
  include/net/netfilter/nf_tables.h
  include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
  net/netfilter/nft_expr_template.c
  and the preliminary implementation of the meta target
  net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 17:15:48 +02:00