Commit Graph

146 Commits

Author SHA1 Message Date
Kirill Tkhai a5a179b6df net: Convert ip_set_net_ops
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:

	#ifdef CONFIG_IP_SET
	extern func(void);
	#else
	static inline func(void);
	#endif

So, there are no more pernet operations, dereferencing
net_generic(net, ip_set_net_id).

ip_set_net_ops are OK to be executed in parallel
for several net, so we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05 10:48:28 -05:00
Pablo Neira Ayuso e553116652 netfilter: remove messages print and boot/module load time
Several reasons for this:

* Several modules maintain internal version numbers, that they print at
  boot/module load time, that are not exposed to userspace, as a
  primitive mechanism to make revision number control from the earlier
  days of Netfilter.

* IPset shows the protocol version at boot/module load time, instead
  display this via module description, as Jozsef suggested.

* Remove copyright notice at boot/module load time in two spots, the
  Netfilter codebase is a collective development effort, if we would
  have to display copyrights for each contributor at boot/module load
  time for each extensions we have, we would probably fill up logs with
  lots of useless information - from a technical standpoint.

So let's be consistent and remove them all.

Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-19 18:39:49 +01:00
Jozsef Kadlecsik f998b6b101 netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to ip_set_net_exit()
Patch "netfilter: ipset: use nfnl_mutex_is_locked" is added the real
mutex locking check, which revealed the missing locking in ip_set_net_exit().

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: syzbot+36b06f219f2439fe62e1@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:11:12 +01:00
Jozsef Kadlecsik 4750005a85 netfilter: ipset: Fix "don't update counters" mode when counters used at the matching
The matching of the counters was not taken into account, fixed.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:11:12 +01:00
Florian Westphal a778a15fa5 netfilter: ipset: add resched points during set listing
When sets are extremely large we can get softlockup during ipset -L.
We could fix this by adding cond_resched_rcu() at the right location
during iteration, but this only works if RCU nesting depth is 1.

At this time entire variant->list() is called under under rcu_read_lock_bh.
This used to be a read_lock_bh() but as rcu doesn't really lock anything,
it does not appear to be needed, so remove it (ipset increments set
reference count before this, so a set deletion should not be possible).

Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:04 +01:00
Florian Westphal 49971b8853 netfilter: ipset: use nfnl_mutex_is_locked
Check that we really hold nfnl mutex here instead of relying on correct
usage alone.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:03 +01:00
Gustavo A. R. Silva e8542dcec0 netfilter: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08 18:01:01 +01:00
Ross Lagerwall e5173418ac netfilter: ipset: Fix race between dump and swap
Fix a race between ip_set_dump_start() and ip_set_swap().
The race is as follows:
* Without holding the ref lock, ip_set_swap() checks ref_netlink of the
  set and it is 0.
* ip_set_dump_start() takes a reference on the set.
* ip_set_swap() does the swap (even though it now has a non-zero
  reference count).
* ip_set_dump_start() gets the set from ip_set_list again which is now a
  different set since it has been swapped.
* ip_set_dump_start() calls __ip_set_put_netlink() and hits a BUG_ON due
  to the reference count being 0.

Fix this race by extending the critical region in which the ref lock is
held to include checking the ref counts.

The race can be reproduced with the following script:
  while :; do
    ipset destroy hash_ip1
    ipset destroy hash_ip2
    ipset create hash_ip1 hash:ip family inet hashsize 1024 \
        maxelem 500000
    ipset create hash_ip2 hash:ip family inet hashsize 300000 \
        maxelem 500000
    ipset create hash_ip3 hash:ip family inet hashsize 1024 \
        maxelem 500000
    ipset save &
    ipset swap hash_ip3 hash_ip2
    ipset destroy hash_ip3
    wait
  done

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-29 12:15:14 +02:00
Florian Westphal e23ed762db netfilter: ipset: pernet ops must be unregistered last
Removing the ipset module leaves a small window where one cpu performs
module removal while another runs a command like 'ipset flush'.

ipset uses net_generic(), unregistering the pernet ops frees this
storage area.

Fix it by first removing the user-visible api handlers and the pernet
ops last.

Fixes: 1785e8f473 ("netfiler: ipset: Add net namespace for ipset")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-26 20:15:17 +02:00
Pablo Neira Ayuso 04ba724b65 netfilter: nfnetlink: extended ACK reporting
Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
subsystem callbacks, so we can work on follow up patches to provide
finer grain error reporting using the new infrastructure that
2d4bc93368 ("netlink: extended ACK reporting") provides.

No functional change, just pass down this new object to callbacks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:38:24 +02:00
David S. Miller a01aa920b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. A large bunch of code cleanups, simplify the conntrack extension
codebase, get rid of the fake conntrack object, speed up netns by
selective synchronize_net() calls. More specifically, they are:

1) Check for ct->status bit instead of using nfct_nat() from IPVS and
   Netfilter codebase, patch from Florian Westphal.

2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.

3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.

4) Introduce nft_is_base_chain() helper function.

5) Enforce expectation limit from userspace conntrack helper,
   from Gao Feng.

6) Add nf_ct_remove_expect() helper function, from Gao Feng.

7) NAT mangle helper function return boolean, from Gao Feng.

8) ctnetlink_alloc_expect() should only work for conntrack with
   helpers, from Gao Feng.

9) Add nfnl_msg_type() helper function to nfnetlink to build the
   netlink message type.

10) Get rid of unnecessary cast on void, from simran singhal.

11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
    also from simran singhal.

12) Use list_prev_entry() from nf_tables, from simran signhal.

13) Remove unnecessary & on pointer function in the Netfilter and IPVS
    code.

14) Remove obsolete comment on set of rules per CPU in ip6_tables,
    no longer true. From Arushi Singhal.

15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.

16) Remove unnecessary nested rcu_read_lock() in
    __nf_nat_decode_session(). Code running from hooks are already
    guaranteed to run under RCU read side.

17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.

18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
    also from Aaron.

19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.

20) Don't propagate NF_DROP error to userspace via ctnetlink in
    __nf_nat_alloc_null_binding() function, from Gao Feng.

21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
    from Gao Feng.

22) Kill the fake untracked conntrack objects, use ctinfo instead to
    annotate a conntrack object is untracked, from Florian Westphal.

23) Remove nf_ct_is_untracked(), now obsolete since we have no
    conntrack template anymore, from Florian.

24) Add event mask support to nft_ct, also from Florian.

25) Move nf_conn_help structure to
    include/net/netfilter/nf_conntrack_helper.h.

26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
    Thus, we don't deal with variable conntrack extensions anymore.
    Make sure userspace conntrack helper doesn't go over that size.
    Remove variable size ct extension infrastructure now this code
    got no more clients. From Florian Westphal.

27) Restore offset and length of nf_ct_ext structure to 8 bytes now
    that wraparound is not possible any longer, also from Florian.

28) Allow to get rid of unassured flows under stress in conntrack,
    this applies to DCCP, SCTP and TCP protocols, from Florian.

29) Shrink size of nf_conntrack_ecache structure, from Florian.

30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
    from Gao Feng.

31) Register SYNPROXY hooks on demand, from Florian Westphal.

32) Use pernet hook whenever possible, instead of global hook
    registration, from Florian Westphal.

33) Pass hook structure to ebt_register_table() to consolidate some
    infrastructure code, from Florian Westphal.

34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
    SYNPROXY code, to make sure device stats are not fooled, patch
    from Gao Feng.

35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
    don't need anymore if we just select a fixed size instead of
    expensive runtime time calculation of this. From Florian.

36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
    from Florian.

37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
    Florian.

38) Attach NAT extension on-demand from masquerade and pptp helper
    path, from Florian.

39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.

40) Speed up netns by selective calls of synchronize_net(), from
    Florian Westphal.

41) Silence stack size warning gcc in 32-bit arch in snmp helper,
    from Florian.

42) Inconditionally call nf_ct_ext_destroy(), even if we have no
    extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
    Liping Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:47:53 -04:00
Aaron Conole db268d4dfd ipset: remove unused function __ip_set_get_netlink
There are no in-tree callers.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15 10:24:41 +02:00
Johannes Berg fceb6435e8 netlink: pass extended ACK struct to parsing functions
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:22 -04:00
Johannes Berg 2d4bc93368 netlink: extended ACK reporting
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:20 -04:00
Arushi Singhal d4ef383541 netfilter: Remove exceptional & on function name
Remove & from function pointers to conform to the style found elsewhere
in the file. Done using the following semantic patch

// <smpl>
@r@
identifier f;
@@

f(...) { ... }
@@
identifier r.f;
@@

- &f
+ f
// </smpl>

Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07 18:24:47 +02:00
simran singhal 68ad546aef netfilter: Remove unnecessary cast on void pointer
The following Coccinelle script was used to detect this:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T*)x)->f
|

- (T*)
  e
)

Unnecessary parantheses are also remove.

Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07 17:29:17 +02:00
Pablo Neira Ayuso dedb67c4b4 netfilter: Add nfnl_msg_type() helper function
Add and use nfnl_msg_type() function to replace opencoded nfnetlink
message type. I suggested this change, Arushi Singhal made an initial
patch to address this but was missing several spots.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07 16:31:36 +02:00
Alexey Dobriyan c7d03a00b5 netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 10:59:15 -05:00
Jozsef Kadlecsik 9e41f26a50 netfilter: ipset: Count non-static extension memory for userspace
Non-static (i.e. comment) extension was not counted into the memory
size. A new internal counter is introduced for this. In the case of
the hash types the sizes of the arrays are counted there as well so
that we can avoid to scan the whole set when just the header data
is requested.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:45 +01:00
Jozsef Kadlecsik bec810d973 netfilter: ipset: Improve skbinfo get/init helpers
Use struct ip_set_skbinfo in struct ip_set_ext instead of open
coded fields and assign structure members in get/init helpers
instead of copying members one by one. Explicitly note that
struct ip_set_skbinfo must be padded to prevent non-aligned
access in the extension blob.

Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.

Suggested-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-11-10 13:28:42 +01:00
Pablo Neira Ayuso 613dbd9572 netfilter: x_tables: move hook state into xt_action_param structure
Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-03 10:56:21 +01:00
Vishwanath Pai 596cf3fe58 netfilter: ipset: fix race condition in ipset save, swap and delete
This fix adds a new reference counter (ref_netlink) for the struct ip_set.
The other reference counter (ref) can be swapped out by ip_set_swap and we
need a separate counter to keep track of references for netlink events
like dump. Using the same ref counter for dump causes a race condition
which can be demonstrated by the following script:

ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \
counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \
counters

ipset save &

ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3 /* will crash the machine */

Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. With this fix in place swap will not succeed because ipset save
still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).

Both delete and swap will error out if ref_netlink != 0 on the set.

Note: The changes to *_head functions is because previously we would
increment ref whenever we called these functions, we don't do that
anymore.

Reviewed-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-28 17:57:45 +02:00
Jozsef Kadlecsik 45040978c8 netfilter: ipset: Fix set:list type crash when flush/dump set in parallel
Flushing/listing entries was not RCU safe, so parallel flush/dump
could lead to kernel crash. Bug reported by Deniz Eren.

Fixes netfilter bugzilla id #1050.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2016-02-24 20:32:21 +01:00
Pablo Neira Ayuso 7b8002a151 netfilter: nfnetlink: pass down netns pointer to call() and call_rcu()
Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-28 18:41:41 +01:00
Jozsef Kadlecsik 95ad1f4a93 netfilter: ipset: Fix extension alignment
The data extensions in ipset lacked the proper memory alignment and
thus could lead to kernel crash on several architectures. Therefore
the structures have been reorganized and alignment attributes added
where needed. The patch was tested on armv7h by Gerhard Wiesinger and
on x86_64, sparc64 by Jozsef Kadlecsik.

Reported-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-07 11:21:47 +01:00
Eric W. Biederman 686c9b5080 netfilter: x_tables: Use par->net instead of computing from the passed net devices
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18 21:58:25 +02:00
Jozsef Kadlecsik ca0f6a5cd9 netfilter: ipset: Fix coding styles reported by checkpatch.pl
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:18 +02:00
Jozsef Kadlecsik b57b2d1fa5 netfilter: ipset: Prepare the ipset core to use RCU at set level
Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking
accordingly. Convert the comment extension into an rcu-avare object. Also,
simplify the timeout routines.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik 9c1ba5c809 netfilter: ipset: Make sure listing doesn't grab a set which is just being destroyed.
There was a small window when all sets are destroyed and a concurrent
listing of all sets could grab a set which is just being destroyed.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik c4c997839c netfilter: ipset: Fix parallel resizing and listing of the same set
When elements added to a hash:* type of set and resizing triggered,
parallel listing could start to list the original set (before resizing)
and "continue" with listing the new set. Fix it by references and
using the original hash table for listing. Therefore the destroying of
the original hash table may happen from the resizing or listing functions.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:15 +02:00
Sergey Popovich 7dd37bc8e6 netfilter: ipset: Check extensions attributes before getting extensions.
Make all extensions attributes checks within ip_set_get_extensions()
and reduce number of duplicated code.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:13 +02:00
Sergey Popovich edda079174 netfilter: ipset: Use SET_WITH_*() helpers to test set extensions
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-06-14 10:40:12 +02:00
Denys Vlasenko a3b1c1eb50 netfilter: ipset: deinline ip_set_put_extensions()
On x86 allyesconfig build:
The function compiles to 489 bytes of machine code.
It has 25 callsites.

    text    data       bss       dec     hex filename
82441375 22255384 20627456 125324215 7784bb7 vmlinux.before
82434909 22255384 20627456 125317749 7783275 vmlinux

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: David S. Miller <davem@davemloft.net>
CC: Jan Engelhardt <jengelh@medozas.de>
CC: Jiri Pirko <jpirko@redhat.com>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: netfilter-devel@vger.kernel.org
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-14 12:51:19 +02:00
Sergey Popovich caed0ed35b netfilter: ipset: Properly calculate extensions offsets and total length
Offsets and total length returned by the ip_set_elem_len()
calculated incorrectly as initial set element length (i.e.
len parameter) is used multiple times in offset calculations,
also affecting set element total length.

Use initial set element length as start offset, do not add aligned
extension offset to the offset. Return offset as total length of
the set element.

This reduces memory requirements on per element basic for the
hash:* type of sets.

For example output from 'ipset -terse list test-1' on 64-bit PC,
where test-1 is generated via following script:

  #!/bin/bash

  set_name='test-1'

  ipset create "$set_name" hash:net family inet \
              timeout 10800 counters comment \
              hashsize 65536 maxelem 65536

  declare -i o3 o4
  fmt="add $set_name 192.168.%u.%u\n"

  for ((o3 = 0; o3 < 256; o3++)); do
      for ((o4 = 0; o4 < 256; o4++)); do
          printf "$fmt" $o3 $o4
      done
  done |ipset -exist restore

BEFORE this patch is applied

  # ipset -terse list test-1
  Name: test-1
  Type: hash:net
  Revision: 6
  Header: family inet hashsize 65536 maxelem 65536
timeout 10800 counters comment
  Size in memory: 26348440

and AFTER applying patch

  # ipset -terse list test-1
  Name: test-1
  Type: hash:net
  Revision: 6
  Header: family inet hashsize 65536 maxelem 65536
timeout 10800 counters comment
  Size in memory: 7706392
  References: 0

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-13 13:25:45 +02:00
Jozsef Kadlecsik 22496f098b netfilter: ipset: Give a better name to a macro in ip_set_core.c
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-13 13:25:44 +02:00
Dan Carpenter 2196937e12 netfilter: ipset: small potential read beyond the end of buffer
We could be reading 8 bytes into a 4 byte buffer here.  It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-11 13:46:37 +01:00
Dan Carpenter 0f9f5e1b83 netfilter: ipset: off by one in ip_set_nfnl_get_byindex()
The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-22 14:12:50 +02:00
Linus Torvalds 35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
Anton Danilov 0e9871e3f7 netfilter: ipset: Add skbinfo extension kernel support in the ipset core.
Skbinfo extension provides mapping of metainformation with lookup in the ipset tables.
This patch defines the flags, the constants, the functions and the structures
for the data type independent support of the extension.
Note the firewall mark stores in the kernel structures as two 32bit values,
but transfered through netlink as one 64bit value.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
Jozsef Kadlecsik 73e64e1813 netfilter: ipset: Fix static checker warning in ip_set_core.c
Dan Carpenter reported the following static checker warning:

        net/netfilter/ipset/ip_set_core.c:1414 call_ad()
        error: 'nlh->nlmsg_len' from user is not capped properly

The payload size is limited now by the max size of size_t.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
Joe Perches b167a37c7b netfilter: Convert pr_warning to pr_warn
Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Masanari Iida 1a84db567a treewide: fix errors in printk
This patch fix spelling typo in printk.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-09-01 11:18:25 +02:00
WANG Cong 4cb28970a2 net: use the new API kvfree()
It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 00:49:51 -07:00
Josh Hunt 07cf8f5ae2 netfilter: ipset: add forceadd kernel support for hash set types
Adds a new property for hash set types, where if a set is created
with the 'forceadd' option and the set becomes full the next addition
to the set may succeed and evict a random entry from the set.

To keep overhead low eviction is done very simply. It checks to see
which bucket the new entry would be added. If the bucket's pos value
is non-zero (meaning there's at least one entry in the bucket) it
replaces the first entry in the bucket. If pos is zero, then it continues
down the normal add process.

This property is useful if you have a set for 'ban' lists where it may
not matter if you release some entries from the set early.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06 09:31:43 +01:00
Ilia Mirkin 6843bc3c56 netfilter: ipset: move registration message to init from net_init
Commit 1785e8f473 ("netfiler: ipset: Add net namespace for ipset") moved
the initialization print into net_init, which can get called a lot due
to namespaces. Move it back into init, reduce to pr_info.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06 09:31:43 +01:00
Sergey Popovich 35f6e63abe netfilter: ipset: Follow manual page behavior for SET target on list:set
ipset(8) for list:set says:
  The match will try to find a matching entry in the sets and the
  target will try to add an entry to the first set to which it can
  be added.

However real behavior is bit differ from described. Consider example:

 # ipset create test-1-v4 hash:ip family inet
 # ipset create test-1-v6 hash:ip family inet6
 # ipset create test-1 list:set
 # ipset add test-1 test-1-v4
 # ipset add test-1 test-1-v6

 # iptables  -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src
 # ip6tables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src

And then when iptables/ip6tables rule matches packet IPSET target
tries to add src from packet to the list:set test-1 where first
entry is test-1-v4 and the second one is test-1-v6.

For IPv4, as it first entry in test-1 src added to test-1-v4
correctly, but for IPv6 src not added!

Placing test-1-v6 to the first element of list:set makes behavior
correct for IPv6, but brokes for IPv4.

This is due to result, returned from ip_set_add() and ip_set_del() from
net/netfilter/ipset/ip_set_core.c when set in list:set equires more
parameters than given or address families do not match (which is this
case).

It seems wrong returning 0 from ip_set_add() and ip_set_del() in
this case, as 0 should be returned only when an element successfuly
added/deleted to/from the set, contrary to ip_set_test() which
returns 0 when no entry exists and >0 when entry found in set.

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-03-06 09:31:41 +01:00
Patrick McHardy 3e90ebd3c9 netfilter: ip_set: rename nfnl_dereference()/nfnl_set()
The next patch will introduce a nfnl_dereference() macro that actually
checks that the appropriate mutex is held and therefore needs a
subsystem argument.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-25 11:29:18 +01:00
stephen hemminger 02eca9d2cc netfilter: ipset: remove unused code
Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:35 +01:00
Jozsef Kadlecsik 93302880d8 netfilter: ipset: Use netlink callback dump args only
Instead of cb->data, use callback dump args only and introduce symbolic
names instead of plain numbers at accessing the argument members.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-22 10:13:59 +02:00
Vitaly Lavrov 1785e8f473 netfiler: ipset: Add net namespace for ipset
This patch adds netns support for ipset.

Major changes were made in ip_set_core.c and ip_set.h.
Global variables are moved to per net namespace.
Added initialization code and the destruction of the network namespace ipset subsystem.
In the prototypes of public functions ip_set_* added parameter "struct net*".

The remaining corrections related to the change prototypes of public functions ip_set_*.

The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347

Signed-off-by: Vitaly Lavrov <lve@guap.ru>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:42:52 +02:00
Oliver Smith 68b63f08d2 netfilter: ipset: Support comments for ipset entries in the core.
This adds the core support for having comments on ipset entries.

The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:28 +02:00
Jozsef Kadlecsik 03c8b234e6 netfilter: ipset: Generalize extensions support
Get rid of the structure based extensions and introduce a blob for
the extensions. Thus we can support more extension types easily.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:27 +02:00
Jozsef Kadlecsik 5e04c0c38c netfilter: ipset: Introduce new operation to get both setname and family
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating a clear error message to the user, which is not
helpful.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-30 21:33:26 +02:00
Jozsef Kadlecsik 169faa2e19 netfilter: ipset: Validate the set family and not the set type family at swapping
This closes netfilter bugzilla #843, reported by Quentin Armitage.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16 20:36:05 +02:00
Jozsef Kadlecsik 0f1799ba1a netfilter: ipset: Consistent userspace testing with nomatch flag
The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:

 # ipset n test hash:net
 # ipset a test 10.0.0.0/24 nomatch
 # ipset t test 10.0.0.1
 10.0.0.1 is NOT in set test.
 # ipset t test 10.0.0.1 nomatch
 10.0.0.1 is in set test.

 # ipset a test 192.168.0.0/24
 # ipset t test 192.168.0.1
 192.168.0.1 is in set test.
 # ipset t test 192.168.0.1 nomatch
 192.168.0.1 is NOT in set test.

 Before the patch the results were

 ...
 # ipset t test 192.168.0.1
 192.168.0.1 is in set test.
 # ipset t test 192.168.0.1 nomatch
 192.168.0.1 is in set test.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16 20:35:55 +02:00
Jozsef Kadlecsik 6e01781d1c netfilter: ipset: set match: add support to match the counters
The new revision of the set match supports to match the counters
and to suppress updating the counters at matching too.

At the set:list types, the updating of the subcounters can be
suppressed as well.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-29 20:09:03 +02:00
Jozsef Kadlecsik 34d666d489 netfilter: ipset: Introduce the counter extension in the core
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-29 20:08:59 +02:00
Jozsef Kadlecsik 075e64c041 netfilter: ipset: Introduce extensions to elements in the core
Introduce extensions to elements in the core and prepare timeout as
the first one.

This patch also modifies the em_ipset classifier to use the new
extension struct layout.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-29 20:08:54 +02:00
Hong zhi guo 573ce260b3 net-next: replace obsolete NLMSG_* with type safe nlmsg_*
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:25:25 -04:00
David S. Miller b86c761f69 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains two bugfixes for netfilter/ipset via
Jozsef Kadlecsik, they are:

* Fix timeout corruption if sets are resized, by Josh Hunt.

* Fix bogus error report if the flag nomatch is set, from Jozsef.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-26 17:24:26 -05:00
Jozsef Kadlecsik dd82088dab netfilter: ipset: "Directory not empty" error message
When an entry flagged with "nomatch" was tested by ipset, it
returned the error message "Kernel error received:
Directory not empty" instead of "<element> is NOT in set <setname>"
(reported by John Brendler).

The internal error code was not properly transformed before returning
to userspace, fixed.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-02-21 17:35:43 +01:00
Pablo Neira Ayuso c14b78e7de netfilter: nfnetlink: add mutex per subsystem
This patch replaces the global lock to one lock per subsystem.
The per-subsystem lock avoids that processes operating
with different subsystems are synchronized.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-05 04:07:35 +01:00
Jozsef Kadlecsik 9076aea765 netfilter: ipset: Increase the number of maximal sets automatically
The max number of sets was hardcoded at kernel cofiguration time and
could only be modified via a module parameter. The patch adds the support
of increasing the max number of sets automatically, as needed.

The array of sets is incremented by 64 new slots if we run out of
empty slots. The absolute limit for the maximal number of sets
is limited by 65534.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 14:36:08 +01:00
Eric W. Biederman df008c91f8 net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow creation of af_key sockets.
Allow creation of llc sockets.
Allow creation of af_packet sockets.

Allow sending xfrm netlink control messages.

Allow binding to netlink multicast groups.
Allow sending to netlink multicast groups.
Allow adding and dropping netlink multicast groups.
Allow sending to all netlink multicast groups and port ids.

Allow reading the netfilter SO_IP_SET socket option.
Allow sending netfilter netlink messages.
Allow setting and getting ip_vs netfilter socket options.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Jozsef Kadlecsik 3e0304a583 netfilter: ipset: Support to match elements marked with "nomatch"
Exceptions can now be matched and we can branch according to the
possible cases:

a. match in the set if the element is not flagged as "nomatch"
b. match in the set if the element is flagged with "nomatch"
c. no match

i.e.

iptables ... -m set --match-set ... -j ...
iptables ... -m set --match-set ... --nomatch-entries -j ...
...

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-22 22:44:34 +02:00
Jozsef Kadlecsik 3ace95c0ac netfilter: ipset: Coding style fixes
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-22 22:44:29 +02:00
Eric W. Biederman 15e473046c netlink: Rename pid to portid to avoid confusion
It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10 15:30:41 -04:00
Tomasz Bursztyka d31f4d448f netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent
This patch fixes a crash if that ipset command is sent over nfnetlink.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-29 13:04:04 +02:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
David S. Miller 7cf7899d9e ipset: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:41 -04:00
Jan Engelhardt c15f1c8325 netfilter: ipset: use NFPROTO_ constants
ipset is actually using NFPROTO values rather than AF (xt_set passes
that along).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:29 +01:00
Pablo Neira Ayuso 80d326fab5 netlink: add netlink_dump_control structure for netlink_dump_start()
Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:10:06 -05:00
Jozsef Kadlecsik be94db9dda netfilter: ipset: dumping error triggered removing references twice
If there was a dumping error in the middle, the set-specific variable was
not zeroed out and thus the 'done' function of the dumping wrongly tried
to release the already released reference of the set. The already released
reference was caught by __ip_set_put and triggered a kernel BUG message.
Reported by Jean-Philippe Menil.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-17 10:52:55 +01:00
Jozsef Kadlecsik 088067f4f1 netfilter: ipset: autoload set type modules safely
Jan Engelhardt noticed when userspace requests a set type unknown
to the kernel, it can lead to a loop due to the unsafe type module
loading. The issue is fixed in this patch.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-17 10:52:46 +01:00
Joe Perches 0a9ee81349 netfilter: Remove unnecessary OOM logging messages
Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:49 +01:00
Jesper Juhl dec17b7451 Remove redundant linux/version.h includes from net/
It was suggested by "make versioncheck" that the follwing includes of
linux/version.h are redundant:

  /home/jj/src/linux-2.6/net/caif/caif_dev.c: 14 linux/version.h not needed.
  /home/jj/src/linux-2.6/net/caif/chnl_net.c: 10 linux/version.h not needed.
  /home/jj/src/linux-2.6/net/ipv4/gre.c: 19 linux/version.h not needed.
  /home/jj/src/linux-2.6/net/netfilter/ipset/ip_set_core.c: 20 linux/version.h not needed.
  /home/jj/src/linux-2.6/net/netfilter/xt_set.c: 16 linux/version.h not needed.

and it seems that it is right.

Beyond manually inspecting the source files I also did a few build
tests with various configs to confirm that including the header in
those files is indeed not needed.

Here's a patch to remove the pointless includes.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-21 16:03:17 -07:00
Jozsef Kadlecsik 15b4d93f03 netfilter: ipset: whitespace and coding fixes detected by checkpatch.pl
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 19:01:26 +02:00
Jozsef Kadlecsik 9d8832320f netfilter: ipset: fix return code for destroy when sets are in use
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 18:57:44 +02:00
Jozsef Kadlecsik b66554cf03 netfilter: ipset: add xt_action_param to the variant level kadt functions, ipset API change
With the change the sets can use any parameter available for the match
and target extensions, like input/output interface. It's required for
the hash:net,iface set type.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 18:56:47 +02:00
Jozsef Kadlecsik f1e00b3979 netfilter: ipset: set type support with multiple revisions added
A set type may have multiple revisions, for example when syntax is
extended. Support continuous revision ranges in set types.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 18:51:41 +02:00
Jozsef Kadlecsik 3d14b171f0 netfilter: ipset: fix adding ranges to hash types
When ranges are added to hash types, the elements may trigger rehashing
the set. However, the last successfully added element was not kept track
so the adding started again with the first element after the rehashing.

Bug reported by Mr Dash Four.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 18:49:17 +02:00
Jozsef Kadlecsik c1e2e04388 netfilter: ipset: support listing setnames and headers too
Current listing makes possible to list sets with full content only.
The patch adds support partial listings, i.e. listing just
the existing setnames or listing set headers, without set members.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 18:47:07 +02:00
Jozsef Kadlecsik ac8cc925d3 netfilter: ipset: options and flags support added to the kernel API
The support makes possible to specify the timeout value for
the SET target and a flag to reset the timeout for already existing
entries.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-06-16 18:42:40 +02:00
Greg Rose c7ac8679be rtnetlink: Compute and store minimum ifinfo dump size
The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-06-09 20:38:07 -07:00
Jozsef Kadlecsik 9184a9cba6 netfilter: ipset: fix ip_set_flush return code
ip_set_flush returned -EPROTO instead of -IPSET_ERR_PROTOCOL, fixed

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-26 19:08:11 +02:00
David S. Miller 0b0dc0f17f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-04-19 11:28:35 -07:00
Jozsef Kadlecsik a8a8a0937e netfilter: ipset: Fix the order of listing of sets
A restoreable saving of sets requires that list:set type of sets
come last and the code part which should have taken into account
the ordering was broken. The patch fixes the listing order.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-19 15:59:15 +02:00
Linus Torvalds c44eaf41a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net: Add support for SMSC LAN9530, LAN9730 and LAN89530
  mlx4_en: Restoring RX buffer pointer in case of failure
  mlx4: Sensing link type at device initialization
  ipv4: Fix "Set rt->rt_iif more sanely on output routes."
  MAINTAINERS: add entry for Xen network backend
  be2net: Fix suspend/resume operation
  be2net: Rename some struct members for clarity
  pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
  dsa/mv88e6131: add support for mv88e6085 switch
  ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
  be2net: Fix a potential crash during shutdown.
  bna: Fix for handling firmware heartbeat failure
  can: mcp251x: Allow pass IRQ flags through platform data.
  smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
  iwlwifi: accept EEPROM version 0x423 for iwl6000
  rt2x00: fix cancelling uninitialized work
  rtlwifi: Fix some warnings/bugs
  p54usb: IDs for two new devices
  wl12xx: fix potential buffer overflow in testmode nvs push
  zd1211rw: reset rx idle timer from tasklet
  ...
2011-04-11 07:27:24 -07:00
Jozsef Kadlecsik 2f9f28b212 netfilter: ipset: references are protected by rwlock instead of mutex
The timeout variant of the list:set type must reference the member sets.
However, its garbage collector runs at timer interrupt so the mutex
protection of the references is a no go. Therefore the reference protection
is converted to rwlock.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 15:19:25 +02:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Jozsef Kadlecsik 6604271c5b netfilter: ipset: References are protected by rwlock instead of mutex
The timeout variant of the list:set type must reference the member sets.
However, its garbage collector runs at timer interrupt so the mutex protection
of the references is a no go. Therefore the reference protection
is converted to rwlock.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-29 15:00:43 +02:00
Jozsef Kadlecsik 5c1aba4678 netfilter: ipset: fix checking the type revision at create command
The revision of the set type was not checked at the create command: if the
userspace sent a valid set type but with not supported revision number,
it'd create a loop.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20 15:35:01 +01:00
Shan Wei 9846ada138 netfilter: ipset: fix the compile warning in ip_set_create
net/netfilter/ipset/ip_set_core.c:615: warning: ‘clash’ may be used uninitialized in this function

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-08 15:37:27 +01:00
Jozsef Kadlecsik 5f52bc3cdd netfilter: ipset: send error message manually
When a message carries multiple commands and one of them triggers
an error, we have to report to the userspace which one was that.
The line number of the command plays this role and there's an attribute
reserved in the header part of the message to be filled out with the error
line number. In order not to modify the original message received from
the userspace, we construct a new, complete netlink error message and
modifies the attribute there, then send it.
Netlink is notified not to send its ACK/error message.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-02 23:56:00 +01:00
Patrick McHardy 8da560ced5 netfilter: ipset: use nla_parse_nested()
Replace calls of the form:

nla_parse(tb, ATTR_MAX, nla_data(attr), nla_len(attr), policy)

by:

nla_parse_nested(tb, ATTR_MAX, attr, policy)

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:27:25 +01:00
Jozsef Kadlecsik a7b4f989a6 netfilter: ipset: IP set core support
The patch adds the IP set core support to the kernel.

The IP set core implements a netlink (nfnetlink) based protocol by which
one can create, destroy, flush, rename, swap, list, save, restore sets,
and add, delete, test elements from userspace. For simplicity (and backward
compatibilty and for not to force ip(6)tables to be linked with a netlink
library) reasons a small getsockopt-based protocol is also kept in order
to communicate with the ip(6)tables match and target.

The netlink protocol passes all u16, etc values in network order with
NLA_F_NET_BYTEORDER flag. The protocol enforces the proper use of the
NLA_F_NESTED and NLA_F_NET_BYTEORDER flags.

For other kernel subsystems (netfilter match and target) the API contains
the functions to add, delete and test elements in sets and the required calls
to get/put refereces to the sets before those operations can be performed.

The set types (which are implemented in independent modules) are stored
in a simple RCU protected list. A set type may have variants: for example
without timeout or with timeout support, for IPv4 or for IPv6. The sets
(i.e. the pointers to the sets) are stored in an array. The sets are
identified by their index in the array, which makes possible easy and
fast swapping of sets. The array is protected indirectly by the nfnl
mutex from nfnetlink. The content of the sets are protected by the rwlock
of the set.

There are functional differences between the add/del/test functions
for the kernel and userspace:

- kernel add/del/test: works on the current packet (i.e. one element)
- kernel test: may trigger an "add" operation  in order to fill
  out unspecified parts of the element from the packet (like MAC address)
- userspace add/del: works on the netlink message and thus possibly
  on multiple elements from the IPSET_ATTR_ADT container attribute.
- userspace add: may trigger resizing of a set

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:28:35 +01:00