linux-stable

Commit Graph

Author	SHA1	Message	Date
Jozsef Kadlecsik	07034aeae1	netfilter: ipset: hash:mac type added to ipset Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:21 +02:00
Anton Danilov	76cea4109c	netfilter: ipset: Add skbinfo extension support to SET target. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:21 +02:00
Anton Danilov	cbee93d7b7	netfilter: ipset: Add skbinfo extension kernel support for the list set type. Add skbinfo extension kernel support for the list set type. Introduce the new revision of the list set type. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Anton Danilov	af331419d3	netfilter: ipset: Add skbinfo extension kernel support for the hash set types. Add skbinfo extension kernel support for the hash set types. Inroduce the new revisions of all hash set types. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Anton Danilov	39d1ecf1ad	netfilter: ipset: Add skbinfo extension kernel support for the bitmap set types. Add skbinfo extension kernel support for the bitmap set types. Inroduce the new revisions of bitmap_ip, bitmap_ipmac and bitmap_port set types. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Anton Danilov	0e9871e3f7	netfilter: ipset: Add skbinfo extension kernel support in the ipset core. Skbinfo extension provides mapping of metainformation with lookup in the ipset tables. This patch defines the flags, the constants, the functions and the structures for the data type independent support of the extension. Note the firewall mark stores in the kernel structures as two 32bit values, but transfered through netlink as one 64bit value. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
Jozsef Kadlecsik	73e64e1813	netfilter: ipset: Fix static checker warning in ip_set_core.c Dan Carpenter reported the following static checker warning: net/netfilter/ipset/ip_set_core.c:1414 call_ad() error: 'nlh->nlmsg_len' from user is not capped properly The payload size is limited now by the max size of size_t. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-09-15 22:20:20 +02:00
David S. Miller	0aac383353	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== nf-next pull request The following patchset contains Netfilter/IPVS updates for your net-next tree. Regarding nf_tables, most updates focus on consolidating the NAT infrastructure and adding support for masquerading. More specifically, they are: 1) use __u8 instead of u_int8_t in arptables header, from Mike Frysinger. 2) Add support to match by skb->pkttype to the meta expression, from Ana Rey. 3) Add support to match by cpu to the meta expression, also from Ana Rey. 4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from Vytas Dauksa. 5) Fix netnet and netportnet hash types the range support for IPv4, from Sergey Popovich. 6) Fix missing-field-initializer warnings resolved, from Mark Rustad. 7) Dan Carperter reported possible integer overflows in ipset, from Jozsef Kadlecsick. 8) Filter out accounting objects in nfacct by type, so you can selectively reset quotas, from Alexey Perevalov. 9) Move specific NAT IPv4 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4. 11) Move specific NAT IPv6 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6. 13) Refactor code to add nft_delrule(), which can be reused in the enhancement of the NFT_MSG_DELTABLE to remove a table and its content, from Arturo Borrero. 14) Add a helper function to unregister chain hooks, from Arturo Borrero. 15) A cleanup to rename to nft_delrule_by_chain for consistency with the new nft_*() functions, also from Arturo. 16) Add support to match devgroup to the meta expression, from Ana Rey. 17) Reduce stack usage for IPVS socket option, from Julian Anastasov. 18) Remove unnecessary textsearch state initialization in xt_string, from Bojan Prtvar. 19) Add several helper functions to nf_tables, more work to prepare the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero. 20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from Arturo Borrero. 21) Support NAT flags in the nat expression to indicate the flavour, eg. random fully, from Arturo. 22) Add missing audit code to ebtables when replacing tables, from Nicolas Dichtel. 23) Generalize the IPv4 masquerading code to allow its re-use from nf_tables, from Arturo. 24) Generalize the IPv6 masquerading code, also from Arturo. 25) Add the new masq expression to support IPv4/IPv6 masquerading from nf_tables, also from Arturo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:46:32 -07:00
Joe Perches	b167a37c7b	netfilter: Convert pr_warning to pr_warn Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Arturo Borrero	9ba1f726be	netfilter: nf_tables: add new nft_masq expression The nft_masq expression is intended to perform NAT in the masquerade flavour. We decided to have the masquerade functionality in a separated expression other than nft_nat. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:30 +02:00
Arturo Borrero	e42eff8a32	netfilter: nft_nat: include a flag attribute Both SNAT and DNAT (and the upcoming masquerade) can have additional configuration parameters, such as port randomization and NAT addressing persistence. We can cover these scenarios by simply adding a flag attribute for userspace to fill when needed. The flags to use are defined in include/uapi/linux/netfilter/nf_nat.h: NF_NAT_RANGE_MAP_IPS NF_NAT_RANGE_PROTO_SPECIFIED NF_NAT_RANGE_PROTO_RANDOM NF_NAT_RANGE_PERSISTENT NF_NAT_RANGE_PROTO_RANDOM_FULLY NF_NAT_RANGE_PROTO_RANDOM_ALL The caller must take care of not messing up with the flags, as they are added unconditionally to the final resulting nf_nat_range. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:27 +02:00
Arturo Borrero	b9ac12ef09	netfilter: nf_tables: extend NFT_MSG_DELTABLE to support flushing the ruleset This patch extend the NFT_MSG_DELTABLE call to support flushing the entire ruleset. The options now are: * No family speficied, no table specified: flush all the ruleset. * Family specified, no table specified: flush all tables in the AF. * Family specified, table specified: flush the given table. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:26 +02:00
Arturo Borrero	ee01d54256	netfilter: nf_tables: add helpers to schedule objects deletion This patch refactor the code to schedule objects deletion. They are useful in follow-up patches. In order to be able to use these new helper functions in all the code, they are placed in the top of the file, with all the dependant functions and symbols. nft_rule_disactivate_next has been renamed to nft_rule_deactivate. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:25 +02:00
Bojan Prtvar	c435201bed	netfilter: xt_string: Remove unnecessary initialization of struct ts_state The skb_find_text() accepts uninitialized textsearch state variable. Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:25 +02:00
Julian Anastasov	5fcf0cf607	ipvs: reduce stack usage for sockopt data Use union to reserve the required stack space for sockopt data which is less than the currently hardcoded value of 128. Now the tables for commands should be more readable. The checks added for readability are optimized by compiler, others warn at compile time if command uses too much stack or exceeds the storage of set_arglen and get_arglen. As Dan Carpenter points out, we can run for unprivileged user, so we can silent some error messages. Signed-off-by: Julian Anastasov <ja@ssi.bg> CC: Dan Carpenter <dan.carpenter@oracle.com> CC: Andrey Utkin <andrey.krieger.utkin@gmail.com> CC: David Binderman <dcb314@hotmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:24 +02:00
Ana Rey	3045d76070	netfilter: nf_tables: add devgroup support in meta expresion Add devgroup support to let us match device group of a packets incoming or outgoing interface. Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:23 +02:00
Arturo Borrero	ce24b7217b	netfilter: nf_tables: rename nf_table_delrule_by_chain() For the sake of homogenize the function naming scheme, let's rename nf_table_delrule_by_chain() to nft_delrule_by_chain(). Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:22 +02:00
Arturo Borrero	c559879406	netfilter: nf_tables: add helper to unregister chain hooks This patch adds a helper function to unregister chain hooks in the chain deletion path. Basically, a code factorization. The new function is useful in follow-up patches. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:21 +02:00
Arturo Borrero	5e266fe7c0	netfilter: nf_tables: refactor rule deletion helper This helper function always schedule the rule to be removed in the following transaction. In follow-up patches, it is interesting to handle separately the logic of rule activation/disactivation from the transaction mechanism. So, this patch simply splits the original nf_tables_delrule_one() in two functions, allowing further control. While at it, for the sake of homigeneize the function naming scheme, let's rename nf_tables_delrule_one() to nft_delrule(). Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:20 +02:00
David S. Miller	eb84d6b604	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-09-07 21:41:53 -07:00
Pablo Neira Ayuso	679ab4ddbd	netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup' CONFIG_IPV6=m CONFIG_NETFILTER_XT_TARGET_TPROXY=y net/built-in.o: In function `nf_tproxy_get_sock_v6.constprop.11': >> xt_TPROXY.c:(.text+0x583a1): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `tproxy_tg_init': >> xt_TPROXY.c:(.init.text+0x1dc3): undefined reference to `nf_defrag_ipv6_enable' This fix is similar to `1a5bbfc` ("netfilter: Fix build errors with xt_socket.c"). Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-07 17:25:16 +02:00
Pablo Neira Ayuso	84a59ca55f	netfilter: add explicit Kconfig for NETFILTER_XT_NAT Paul Bolle reports that 'select NETFILTER_XT_NAT' from the IPV4 and IPV6 NAT tables becomes noop since there is no Kconfig switch for it. Add the Kconfig switch to resolve this problem. Fixes: `8993cf8` netfilter: move NAT Kconfig switches out of the iptables scope Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 17:23:31 -07:00
Pablo Neira Ayuso	cbb8125eb4	netfilter: nfnetlink: deliver netlink errors on batch completion We have to wait until the full batch has been processed to deliver the netlink error messages to userspace. Otherwise, we may deliver duplicated errors to userspace in case that we need to abort and replay the transaction if any of the required modules needs to be autoloaded. A simple way to reproduce this (assumming nft_meta is not loaded) with the following test file: add table filter add chain filter test add chain bad test # intentional wrong unexistent table add rule filter test meta mark 0 Then, when trying to load the batch: # nft -f test test:4:1-19: Error: Could not process rule: No such file or directory add chain bad test ^^^^^^^^^^^^^^^^^^^ test:4:1-19: Error: Could not process rule: No such file or directory add chain bad test ^^^^^^^^^^^^^^^^^^^ The error is reported twice, once when the batch is aborted due to missing nft_meta and another when it is fully processed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-03 16:56:23 +02:00
Pablo Neira Ayuso	d99407f42f	netfilter: nft_rbtree: no need for spinlock from set destroy path The sets are released from the rcu callback, after the rule is removed from the chain list, which implies that nfnetlink cannot update the rbtree and no packets are walking on the set anymore. Thus, we can get rid of the spinlock in the set destroy path there. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewied-by: Thomas Graf <tgraf@suug.ch>	2014-09-03 10:57:08 +02:00
Pablo Neira Ayuso	39f390167e	netfilter: nft_hash: no need for rcu in the hash set destroy path The sets are released from the rcu callback, after the rule is removed from the chain list, which implies that nfnetlink cannot update the hashes (thus, no resizing may occur) and no packets are walking on the set anymore. This resolves a lockdep splat in the nft_hash_destroy() path since the nfnl mutex is not held there. =============================== [ INFO: suspicious RCU usage. ] 3.16.0-rc2+ #168 Not tainted ------------------------------- net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by ksoftirqd/0/3: #0: (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7 stack backtrace: CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168 Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006 ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400 ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08 Call Trace: [<ffffffff8142c922>] dump_stack+0x4e/0x68 [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103 [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash] [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Thomas Graf <tgraf@suug.ch>	2014-09-03 10:57:06 +02:00
Pablo Neira Ayuso	d79a61d646	netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_* CONFIG_NETFILTER_XT_TARGET_LOG is not selected anymore when jumping from 3.16 to 3.17-rc1 if you don't set on the new NF_LOG_IPV4 and NF_LOG_IPV6 switches. Change this to select the three new symbols NF_LOG_COMMON, NF_LOG_IPV4 and NF_LOG_IPV6 instead, so NETFILTER_XT_TARGET_LOG remains enabled when moving from old to new kernels. Reported-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-01 13:46:31 +02:00
Masanari Iida	1a84db567a	treewide: fix errors in printk This patch fix spelling typo in printk. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-09-01 11:18:25 +02:00
Julian Anastasov	eb90b0c734	ipvs: fix ipv6 hook registration for local replies commit `fc60476761` ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768 Reported-by: Chris J Arges <chris.j.arges@canonical.com> Tested-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-08-28 10:52:37 +09:00
Julian Anastasov	ea1d5d7755	ipvs: properly declare tunnel encapsulation The tunneling method should properly use tunnel encapsulation. Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum offload is supported. Thanks to Alex Gartrell for reporting the problem, providing solution and for all suggestions. Reported-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-08-27 14:31:56 +09:00
Alexey Perevalov	f111f780ae	netfilter: nfnetlink_acct: add filter support to nfacct counter list/reset You can use this to skip accounting objects when listing/resetting via NFNL_MSG_ACCT_GET/NFNL_MSG_ACCT_GET_CTRZERO messages with the NLM_F_DUMP netlink flag. The filtering covers the following cases: 1. No filter specified. In this case, the client will get old behaviour, 2. List/reset counter object only: In this case, you have to use NFACCT_F_QUOTA as mask and value 0. 3. List/reset quota objects only: You have to use NFACCT_F_QUOTA_PKTS as mask and value - the same, for byte based quota mask should be NFACCT_F_QUOTA_BYTES and value - the same. If you want to obtain the object with any quota type (ie. NFACCT_F_QUOTA_PKTS\|NFACCT_F_QUOTA_BYTES), you need to perform two dump requests, one to obtain NFACCT_F_QUOTA_PKTS objects and another for NFACCT_F_QUOTA_BYTES. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-26 21:36:19 +02:00
Zhouyi Zhou	d1c85c2ebe	netfilter: HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure that the toolchain has the required support in addition to CONFIG_JUMP_LABEL being set. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-25 10:45:28 +02:00
Jozsef Kadlecsik	1b05756c48	netfilter: ipset: Fix warn: integer overflows 'sizeof(map) + size set->dsize' Dan Carpenter reported that the static checker emits the warning net/netfilter/ipset/ip_set_list_set.c:600 init_list_set() warn: integer overflows 'sizeof(map) + size set->dsize' Limit the maximal number of elements in list type of sets. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:33:10 +02:00
Mark Rustad	94729f8a1e	netfilter: ipset: Resolve missing-field-initializer warnings Resolve missing-field-initializer warnings by providing a directed initializer. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:32:34 +02:00
Sergey Popovich	6e41ee684e	netfilter: ipset: netnet,netportnet: Fix value range support for IPv4 Ranges of values are broken with hash:net,net and hash:net,port,net. hash:net,net ============ # ipset create test-nn hash:net,net # ipset add test-nn 10.0.10.1-10.0.10.127,10.0.0.0/8 # ipset list test-nn Name: test-nn Type: hash:net,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 16960 References: 0 Members: 10.0.10.1,10.0.0.0/8 # ipset test test-nn 10.0.10.65,10.0.0.1 10.0.10.65,10.0.0.1 is NOT in set test-nn. # ipset test test-nn 10.0.10.1,10.0.0.1 10.0.10.1,10.0.0.1 is in set test-nn. hash:net,port,net ================= # ipset create test-npn hash:net,port,net # ipset add test-npn 10.0.10.1-10.0.10.127,tcp:80,10.0.0.0/8 # ipset list test-npn Name: test-npn Type: hash:net,port,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 17344 References: 0 Members: 10.0.10.8/29,tcp:80,10.0.0.0 10.0.10.16/28,tcp:80,10.0.0.0 10.0.10.2/31,tcp:80,10.0.0.0 10.0.10.64/26,tcp:80,10.0.0.0 10.0.10.32/27,tcp:80,10.0.0.0 10.0.10.4/30,tcp:80,10.0.0.0 10.0.10.1,tcp:80,10.0.0.0 # ipset list test-npn # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.2 10.0.10.126,tcp:80,10.0.0.2 is NOT in set test-npn. # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0 10.0.10.126,tcp:80,10.0.0.0 is in set test-npn. # ipset create test-npn hash:net,port,net # ipset add test-npn 10.0.10.0/24,tcp:80-81,10.0.0.0/8 # ipset list test-npn Name: test-npn Type: hash:net,port,net Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 17024 References: 0 Members: 10.0.10.0,tcp:80,10.0.0.0 10.0.10.0,tcp:81,10.0.0.0 # ipset test test-npn 10.0.10.126,tcp:80,10.0.0.0 10.0.10.126,tcp:80,10.0.0.0 is NOT in set test-npn. # ipset test test-npn 10.0.10.0,tcp:80,10.0.0.0 10.0.10.0,tcp:80,10.0.0.0 is in set test-npn. Correctly setup from..to variables where no IPSET_ATTR_IP_TO{,2} attribute is given, so in range processing loop we construct proper cidr value. Check whenever we have no ranges and can short cut in hash:net,net properly. Use unlikely() where appropriate, to comply with other modules. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:32:05 +02:00
Vytas Dauksa	ecc245c2bd	netfilter: ipset: Removed invalid IPSET_ATTR_MARKMASK validation Markmask is an u32, hence it can't be greater then 4294967295 ( i.e. 0xffffffff ). This was causing smatch warning: net/netfilter/ipset/ip_set_hash_gen.h:1084 hash_ipmark_create() warn: impossible condition '(markmask > 4294967295) => (0-u32max > u32max)' Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-08-24 19:31:34 +02:00
Ana Rey	afc5be3079	netfilter: nft_meta: Add cpu attribute support Add cpu support to meta expresion. This allows you to match packets with cpu number. Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-24 14:08:46 +02:00
Ana Rey	e2a093ff0d	netfilter: nft_meta: add pkttype support Add pkttype support for ip, ipv6 and inet families of tables. This allows you to fetch the meta packet type based on the link layer information. The loopback traffic is a special case, the packet type is guessed from the network layer header. No special handling for bridge and arp since we're not going to see such traffic in the loopback interface. Joint work with Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-24 14:06:39 +02:00
Daniel Borkmann	8fc54f6891	net: use reciprocal_scale() helper Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 12:21:21 -07:00
Eric Dumazet	d2de875c6d	net: use ktime_get_ns() and ktime_get_real_ns() helpers ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 19:57:23 -07:00
Pablo Neira Ayuso	1e8430f30b	netfilter: nf_tables: nat expression must select CONFIG_NF_NAT This enables the netfilter NAT engine in first place, otherwise you cannot ever select the nf_tables nat expression if iptables is not selected. Reported-by: Matteo Croce <technoboy85@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-19 21:42:45 +02:00
Daniel Borkmann	caa8ad94ed	netfilter: x_tables: allow to use default cgroup match There's actually no good reason why we cannot use cgroup id 0, so lets just remove this artificial barrier. Reported-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-19 21:38:55 +02:00
Pablo Neira Ayuso	8993cf8edf	netfilter: move NAT Kconfig switches out of the iptables scope Currently, the NAT configs depend on iptables and ip6tables. However, users should be capable of enabling NAT for nft without having to switch on iptables. Fix this by adding new specific IP_NF_NAT and IP6_NF_NAT config switches for iptables and ip6tables NAT support. I have also moved the original NF_NAT_IPV4 and NF_NAT_IPV6 configs out of the scope of iptables to make them independent of it. This patch also adds NETFILTER_XT_NAT which selects the xt_nat combo that provides snat/dnat for iptables. We cannot use NF_NAT anymore since nf_tables can select this. Reported-by: Matteo Croce <technoboy85@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-18 21:55:54 +02:00
Julia Lawall	609ccf0877	netfilter: nf_tables: fix error return code Convert a zero return value on error to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if ($ret < 0\\|ret != 0$) { ... return ret; } \| ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-08 16:47:29 +02:00
Pablo Neira Ayuso	7926dbfa4b	netfilter: don't use mutex_lock_interruptible() Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Julian Anastasov <ja@ssi.bg>	2014-08-08 16:47:23 +02:00
Pablo Neira Ayuso	b88825de85	netfilter: nf_tables: don't update chain with unset counters Fix possible replacement of the per-cpu chain counters by null pointer when updating an existing chain in the commit path. Reported-by: Matteo Croce <technoboy85@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-08 15:38:50 +02:00
Pablo Neira Ayuso	a3716e70e1	netfilter: nf_tables: uninitialize element key/data from the commit path This should happen once the element has been effectively released in the commit path, not before. This fixes a possible chain refcount leak if the transaction is aborted. Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-08 15:38:46 +02:00
David S. Miller	d247b6ab3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/Makefile net/ipv6/sysctl_net_ipv6.c Two ipv6_table_template[] additions overlap, so the index of the ipv6_table[x] assignments needed to be adjusted. In the drivers/net/Makefile case, we've gotten rid of the garbage whereby we had to list every single USB networking driver in the top-level Makefile, there is just one "USB_NETWORKING" that guards everything. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 18:46:26 -07:00
Thomas Graf	cfe4a9dda0	nftables: Convert nft_hash to use generic rhashtable The sizing of the hash table and the practice of requiring a lookup to retrieve the pprev to be stored in the element cookie before the deletion of an entry is left intact. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Patrick McHardy <kaber@trash.net> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 19:49:38 -07:00
Alexei Starovoitov	7ae457c1e5	net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_' prefix - everything that is pure BPF is changed to 'bpf_' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern orig_prog; unsigned int (bpf_func)(const struct sk_buff skb, const struct bpf_insn filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter ' and BPF_PROG_RUN to be used with 'struct bpf_prog ' __sk_filter_release(struct sk_filter ) gains __bpf_prog_release(struct bpf_prog ) helper function also perform related renames for the functions that work with 'struct bpf_prog ', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock )/sk_detach_filter(struct sock ) and SK_RUN_FILTER(struct sk_filter , ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog )/bpf_prog_destroy(struct bpf_prog ) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:03:58 -07:00
Thomas Graf	0dc1362562	netfilter: nf_tables: Avoid duplicate call to nft_data_uninit() for same key nft_del_setelem() currently calls nft_data_uninit() twice on the same key. Once to release the key which is guaranteed to be NFT_DATA_VALUE and a second time in the error path to which it falls through. The second call has been harmless so far though because the type passed is always NFT_DATA_VALUE which is currently a no-op. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-08-01 18:14:49 +02:00
David S. Miller	a173e550c2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains netfilter updates for net-next, they are: 1) Add the reject expression for the nf_tables bridge family, this allows us to send explicit reject (TCP RST / ICMP dest unrech) to the packets matching a rule. 2) Simplify and consolidate the nf_tables set dumping logic. This uses netlink control->data to filter out depending on the request. 3) Perform garbage collection in xt_hashlimit using a workqueue instead of a timer, which is problematic when many entries are in place in the tables, from Eric Dumazet. 4) Remove leftover code from the removed ulog target support, from Paul Bolle. 5) Dump unmodified flags in the netfilter packet accounting when resetting counters, so userspace knows that a counter was in overquota situation, from Alexey Perevalov. 6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from Alexey. 7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST attribute. This patchset also includes a couple of cleanups for xt_LED from Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from Himangi Saraogi. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 14:09:14 -07:00
Pablo Neira Ayuso	7d5570ca89	netfilter: nf_tables: check for unset NFTA_SET_ELEM_LIST_ELEMENTS attribute Otherwise, the kernel oopses in nla_for_each_nested when iterating over the unset attribute NFTA_SET_ELEM_LIST_ELEMENTS in the nf_tables_{new,del}setelem() path. netlink: 65524 bytes leftover after parsing attributes in process `nft'. [...] Oops: 0000 [#1] SMP [...] CPU: 2 PID: 6287 Comm: nft Not tainted 3.16.0-rc2+ #169 RIP: 0010:[<ffffffffa0526e61>] [<ffffffffa0526e61>] nf_tables_newsetelem+0x82/0xec [nf_tables] [...] Call Trace: [<ffffffffa05178c4>] nfnetlink_rcv+0x2e7/0x3d7 [nfnetlink] [<ffffffffa0517939>] ? nfnetlink_rcv+0x35c/0x3d7 [nfnetlink] [<ffffffff8137d300>] netlink_unicast+0xf8/0x17a [<ffffffff8137d6a5>] netlink_sendmsg+0x323/0x351 [...] Fix this by returning -EINVAL if this attribute is not set, which doesn't make sense at all since those commands are there to add and to delete elements from the set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-31 21:11:43 +02:00
Alexey Perevalov	b6d0468804	netfilter: nfnetlink_acct: avoid using NFACCT_F_OVERQUOTA with bit helper functions Bit helper functions were used for manipulation with NFACCT_F_OVERQUOTA, but they are accepting pit position, but not a bit mask. As a result not a third bit for NFACCT_F_OVERQUOTA was set, but forth. Such behaviour was dangarous and could lead to unexpected overquota report result. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-31 19:55:47 +02:00
David S. Miller	f139c74a8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 13:25:49 -07:00
Alexey Perevalov	d24675cb1f	netfilter: nfnetlink_acct: dump unmodified nfacct flags NFNL_MSG_ACCT_GET_CTRZERO modifies dumped flags, in this case client see unmodified (uncleared) counter value and cleared overquota state - end user doesn't know anything about overquota state, unless end user subscribed on overquota report. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-30 18:16:56 +02:00
Jiri Prchal	8452e6ff3e	netfilter: xt_LED: fix too short led-always-blink If led-always-blink is set, then between switch led OFF and ON is almost zero time. So blink is invisible. This use oneshot led trigger with fixed time 50ms witch is enough to see blink. Signed-off-by: Jiri Prchal <jiri.prchal@aksignal.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 16:09:28 +02:00
Duan Jiong	a2b60c75fa	netfilter: xt_LED: don't output error message redundantly The function led_trigger_register() will only return -EEXIST when error arises. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 14:55:33 +02:00
Eric Dumazet	7bd8490eef	netfilter: xt_hashlimit: perform garbage collection from process context xt_hashlimit cannot be used with large hash tables, because garbage collector is run from a timer. If table is really big, its possible to hold cpu for more than 500 msec, which is unacceptable. Switch to a work queue, and use proper scheduling points to remove latencies spikes. Later, we also could switch to a smoother garbage collection done at lookup time, one bucket at a time... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-24 13:07:25 +02:00
Pablo Neira Ayuso	5b96af7713	netfilter: nf_tables: simplify set dump through netlink This patch uses the cb->data pointer that allows us to store the context when dumping the set list. Thus, we don't need to parse the original netlink message containing the dump request for each recvmsg() call when dumping the set list. The different function flavours depending on the dump criteria has been also merged into one single generic function. This saves us ~100 lines of code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-22 12:08:54 +02:00
David S. Miller	8fd90bb889	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-22 00:44:59 -07:00
David S. Miller	a8138f42d4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 21:01:43 -07:00
Alex Gartrell	76f084bc10	ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-17 12:53:54 +09:00
Yannick Brosseau	16ea4c6b9d	ipvs: Remove dead debug code This code section cannot compile as it refer to non existing variable It also pre-date git history. Signed-off-by: Yannick Brosseau <scientist@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-16 10:07:11 +09:00
Fabian Frederick	b734427a4f	ipvs: remove null test before kfree Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-16 10:07:01 +09:00
Julian Anastasov	2627b7e15c	ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack commit `8f4e0a1868` ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-07-16 09:39:28 +09:00
Eric Dumazet	ce355e209f	netfilter: nf_tables: 64bit stats need some extra synchronization Use generic u64_stats_sync infrastructure to get proper 64bit stats, even on 32bit arches, at no extra cost for 64bit arches. Without this fix, 32bit arches can have some wrong counters at the time the carry is propagated into upper word. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-14 12:00:17 +02:00
Pablo Neira Ayuso	38e029f14a	netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale An updater may interfer with the dumping of any of the object lists. Fix this by using a per-net generation counter and use the nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set to notify userspace that it has to restart the dump since an updater has interfered. This patch also replaces the existing consistency checking code in the rule dumping path since it is broken. Basically, the value that the dump callback returns is not propagated to userspace via netlink_dump_start(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-14 12:00:16 +02:00
Pablo Neira Ayuso	e688a7f8c6	netfilter: nf_tables: safe RCU iteration on list when dumping The dump operation through netlink is not protected by the nfnl_lock. Thus, a reader process can be dumping any of the existing object lists while another process can be updating the list content. This patch resolves this situation by protecting all the object lists with RCU in the netlink dump path which is the reader side. The updater path is already protected via nfnl_lock, so use list manipulation RCU-safe operations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-14 11:20:45 +02:00
Pablo Neira Ayuso	63283dd21e	netfilter: nf_tables: skip transaction if no update flags in tables Skip transaction handling for table updates with no changes in the flags. This fixes a crash when passing the table flag with all bits unset. Reported-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-30 11:44:24 +02:00
Duan Jiong	24de3d3775	netfilter: use IS_ENABLED() macro replace: #if defined(CONFIG_NF_CT_NETLINK) \|\| defined(CONFIG_NF_CT_NETLINK_MODULE) with #if IS_ENABLED(CONFIG_NF_CT_NETLINK) replace: #if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE) with #if !IS_ENABLED(CONFIG_NF_NAT) replace: #if !defined(CONFIG_NF_CONNTRACK) && !defined(CONFIG_NF_CONNTRACK_MODULE) with #if !IS_ENABLED(CONFIG_NF_CONNTRACK) And add missing: IS_ENABLED(CONFIG_NF_CT_NETLINK) in net/ipv{4,6}/netfilter/nf_nat_l3proto_ipv{4,6}.c Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-30 11:38:03 +02:00
Fengguang Wu	5cbfda2043	netfilter: nft_log: fix coccinelle warnings net/netfilter/nft_log.c:79:44-45: Unneeded semicolon Removes unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-29 13:55:08 +02:00
Pablo Neira Ayuso	ca1aa54f27	netfilter: xt_LOG: add missing string format in nf_log_packet() net/netfilter/xt_LOG.c: In function 'log_tg': >> net/netfilter/xt_LOG.c:43: error: format not a string literal and no format arguments Fixes: `fab4085` ("netfilter: log: nf_log_packet() as real unified interface") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-28 18:50:35 +02:00
Pablo Neira Ayuso	c1878869c0	netfilter: fix several Kconfig problems in NF_LOG_* warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && IP6_NF_IPTABLES && NETFILTER_ADVANCED) warning: (NF_LOG_IPV4 && NF_LOG_IPV6) selects NF_LOG_COMMON which has unmet direct dependencies (NET && INET && NETFILTER && NF_CONNTRACK) Fixes: `83e96d4` ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-28 18:49:49 +02:00
Pablo Neira Ayuso	09d27b88f1	netfilter: nft_log: complete logging support Use the unified nf_log_packet() interface that allows us explicit logger selection through the nf_loginfo structure. If you specify the group attribute, this means you want to receive logging messages through nfnetlink_log. In that case, the snaplen and qthreshold attributes allows you to tune internal aspects of the netlink logging infrastructure. On the other hand, if the level is specified, then the plain text format through the kernel logging ring is used instead, which is also used by default if neither group nor level are indicated. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:21:37 +02:00
Pablo Neira Ayuso	85d30e2416	netfilter: nft_log: request explicit logger when loading rules This includes the special handling for NFPROTO_INET. There is no real inet logger since we don't see packets of this family. However, rules are loaded using this special family type. So let's just request both IPV4 and IPV6 loggers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:21:30 +02:00
Pablo Neira Ayuso	960649d192	netfilter: bridge: add generic packet logger This adds the generic plain text packet loggger for bridged packets. It routes the logging message to the real protocol packet logger. I decided not to refactor the ebt_log code for two reasons: 1) The ebt_log output is not consistent with the IPv4 and IPv6 Netfilter packet loggers. The output is different for no good reason and it adds redundant code to handle packet logging. 2) To avoid breaking backward compatibility for applications outthere that are parsing the specific ebt_log output, the ebt_log output has been left as is. So only nftables will use the new consistent logging format for logged bridged packets. More decisions coming in this patch: 1) This also removes ebt_log as default logger for bridged packets. Thus, nf_log_packet() routes packet to this new packet logger instead. This doesn't break backward compatibility since nf_log_packet() is not used to log packets in plain text format from anywhere in the ebtables/netfilter bridge code. 2) The new bridge packet logger also performs a lazy request to register the real IPv4, ARP and IPv6 netfilter packet loggers. If the real protocol logger is no available (not compiled or the module is not available in the system, not packet logging happens. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:20:47 +02:00
Pablo Neira Ayuso	fab4085f4e	netfilter: log: nf_log_packet() as real unified interface Before this patch, the nf_loginfo parameter specified the logging configuration in case the specified default logger was loaded. This patch updates the semantics of the nf_loginfo parameter in nf_log_packet() which now indicates the logger that you explicitly want to use. Thus, nf_log_packet() is exposed as an unified interface which internally routes the log message to the corresponding logger type by family. The module dependencies are expressed by the new nf_logger_find_get() and nf_logger_put() functions which bump the logger module refcount. Thus, you can not remove logger modules that are used by rules anymore. Another important effect of this change is that the family specific module is only loaded when required. Therefore, xt_LOG and nft_log will just trigger the autoload of the nf_log_{ip,ip6} modules according to the family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:20:13 +02:00
Pablo Neira Ayuso	83e96d443b	netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files The plain text logging is currently embedded into the xt_LOG target. In order to be able to use the plain text logging from nft_log, as a first step, this patch moves the family specific code to the following files and Kconfig symbols: 1) net/ipv4/netfilter/nf_log_ip.c: CONFIG_NF_LOG_IPV4 2) net/ipv6/netfilter/nf_log_ip6.c: CONFIG_NF_LOG_IPV6 3) net/netfilter/nf_log_common.c: CONFIG_NF_LOG_COMMON These new modules will be required by xt_LOG and nft_log. This patch is based on original patch from Arturo Borrero Gonzalez. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:19:59 +02:00
Pablo Neira Ayuso	27fd8d90c9	netfilter: nf_log: move log buffering to core logging This patch moves Eric Dumazet's log buffer implementation from the xt_log.h header file to the core net/netfilter/nf_log.c. This also includes the renaming of the structure and functions to avoid possible undesired namespace clashes. This change allows us to use it from the arp and bridge packet logging implementation in follow up patches.	2014-06-25 19:28:43 +02:00
Pablo Neira Ayuso	5962815a6a	netfilter: nf_log: use an array of loggers instead of list Now that legacy ulog targets are not available anymore in the tree, we can have up to two possible loggers: 1) The plain text logging via kernel logging ring. 2) The nfnetlink_log infrastructure which delivers log messages to userspace. This patch replaces the list of loggers by an array of two pointers per family for each possible logger and it also introduces a new field to the nf_logger structure which indicates the position in the logger array (based on the logger type). This prepares a follow up patch that consolidates the nf_log_packet() interface by allowing to specify the logger as parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:28:43 +02:00
Florian Westphal	9500507c61	netfilter: conntrack: remove timer from ecache extension This brings the (per-conntrack) ecache extension back to 24 bytes in size (was 152 byte on x86_64 with lockdep on). When event delivery fails, re-delivery is attempted via work queue. Redelivery is attempted at least every 0.1 seconds, but can happen more frequently if userspace is not congested. The nf_ct_release_dying_list() function is removed. With this patch, ownership of the to-be-redelivered conntracks (on-dying-list-with-DYING-bit not yet set) is with the work queue, which will release the references once event is out. Joint work with Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:15:38 +02:00
Eric Dumazet	f6b50824f7	netfilter: x_tables: xt_free_table_info() cleanup kvfree() helper can make xt_free_table_info() much cleaner. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 14:52:16 +02:00
Fabian Frederick	397304b52d	netfilter: ctnetlink: remove null test before kfree Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 14:49:38 +02:00
Florian Westphal	945b2b2d25	netfilter: nf_nat: fix oops on netns removal Quoting Samu Kallio: Basically what's happening is, during netns cleanup, nf_nat_net_exit gets called before ipv4_net_exit. As I understand it, nf_nat_net_exit is supposed to kill any conntrack entries which have NAT context (through nf_ct_iterate_cleanup), but for some reason this doesn't happen (perhaps something else is still holding refs to those entries?). When ipv4_net_exit is called, conntrack entries (including those with NAT context) are cleaned up, but the nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The bug happens when attempting to free a conntrack entry whose NAT hash 'prev' field points to a slot in the freed hash table (head for that bin). We ignore conntracks with null nat bindings. But this is wrong, as these are in bysource hash table as well. Restore nat-cleaning for the netns-is-being-removed case. bug: https://bugzilla.kernel.org/show_bug.cgi?id=65191 Fixes: `c2d421e171` ('netfilter: nf_nat: fix race when unloading protocol modules') Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com> Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:58:54 +02:00
Ken-ichirou MATSUZAWA	4a001068d7	netfilter: ctnetlink: add zone size to length Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:53:03 +02:00
Pablo Neira Ayuso	98ca74f4d5	Merge branch 'ipvs' Simon Horman says: ==================== Fix for panic due use of tot_stats estimator outside of CONFIG_SYSCTL It has been present since v3.6.39. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:22:33 +02:00
Pablo Neira Ayuso	915136065b	netfilter: nft_nat: don't dump port information if unset Don't include port information attributes if they are unset. Reported-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:08:14 +02:00
Pablo Neira Ayuso	6403d96254	netfilter: nf_tables: indicate family when dumping set elements Set the nfnetlink header that indicates the family of this element. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:08:09 +02:00
Pablo Neira Ayuso	3d9b142131	netfilter: nft_compat: call {target, match}->destroy() to cleanup entry Otherwise, the reference to external objects (eg. modules) are not released when the rules are removed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:08:04 +02:00
Pablo Neira Ayuso	ac904ac835	netfilter: nf_tables: fix wrong type in transaction when replacing rules In `b380e5c` ("netfilter: nf_tables: add message type to transactions"), I used the wrong message type in the rule replacement case. The rule that is replaced needs to be handled as a deleted rule. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:58 +02:00
Pablo Neira Ayuso	ac34b86197	netfilter: nf_tables: decrement chain use counter when replacing rules Thus, the chain use counter remains with the same value after the rule replacement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:50 +02:00
Pablo Neira Ayuso	a0a7379e16	netfilter: nf_tables: use u32 for chain use counter Since `4fefee5` ("netfilter: nf_tables: allow to delete several objects from a batch"), every new rule bumps the chain use counter. However, this is limited to 16 bits, which means that it will overrun after 2^16 rules. Use a u32 chain counter and check for overflows (just like we do for table objects). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:44 +02:00
Pablo Neira Ayuso	5bc5c30765	netfilter: nf_tables: use RCU-safe list insertion when replacing rules The patch `5e94846` ("netfilter: nf_tables: add insert operation") did not include RCU-safe list insertion when replacing rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 13:07:29 +02:00
Florian Westphal	cd5f336f17	netfilter: ctnetlink: fix refcnt leak in dying/unconfirmed list dumper 'last' keeps track of the ct that had its refcnt bumped during previous dump cycle. Thus it must not be overwritten until end-of-function. Another (unrelated, theoretical) issue: Don't attempt to bump refcnt of a conntrack whose reference count is already 0. Such conntrack is being destroyed right now, its memory is freed once we release the percpu dying spinlock. Fixes: `b7779d06` ('netfilter: conntrack: spinlock per cpu to protect special lists.') Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 12:51:36 +02:00
Pablo Neira Ayuso	266155b2de	netfilter: ctnetlink: fix dumping of dying/unconfirmed conntracks The dumping prematurely stops, it seems the callback argument that indicates that all entries have been dumped is set after iterating on the first cpu list. The dumping also may stop before the entire per-cpu list content is also dumped. With this patch, conntrack -L dying now shows the dying list content again. Fixes: `b7779d06` ("netfilter: conntrack: spinlock per cpu to protect special lists.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-16 12:51:35 +02:00
Julian Anastasov	9802d21e7a	ipvs: stop tot_stats estimator only under CONFIG_SYSCTL The tot_stats estimator is started only when CONFIG_SYSCTL is defined. But it is stopped without checking CONFIG_SYSCTL. Fix the crash by moving ip_vs_stop_estimator into ip_vs_control_net_cleanup_sysctl. The change is needed after commit `14e405461e` ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39. Reported-by: Jet Chen <jet.chen@intel.com> Tested-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-06-13 16:22:25 +09:00
Linus Torvalds	f9da455b93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...	2014-06-12 14:27:40 -07:00
Linus Torvalds	b20dcab9d4	LLVMLinux patches for v3.16 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEABECAAYFAlOTY+wACgkQuseO5dulBZXrIgCdFZyXRojufLLKikWEvHjZ3/k5 KsQAnimtcge+62/IX7YwDjWS+xg9Wt3m =yPrI -----END PGP SIGNATURE----- Merge tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel Pull LLVM patches from Behan Webster: "Next set of patches to support compiling the kernel with clang. They've been soaking in linux-next since the last merge window. More still in the works for the next merge window..." * tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel: arm, unwind, LLVMLinux: Enable clang to be used for unwinding the stack ARM: LLVMLinux: Change "extern inline" to "static inline" in glue-cache.h all: LLVMLinux: Change DWARF flag to support gcc and clang net: netfilter: LLVMLinux: vlais-netfilter crypto: LLVMLinux: aligned-attribute.patch	2014-06-08 12:27:44 -07:00
Linus Torvalds	3f17ea6dea	Merge branch 'next' (accumulated 3.16 merge window patches) into master Now that 3.15 is released, this merges the 'next' branch into 'master', bringing us to the normal situation where my 'master' branch is the merge window. * accumulated work in next: (6809 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc\|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...	2014-06-08 11:31:16 -07:00
Mark Charlebois	066c6807f7	net: netfilter: LLVMLinux: vlais-netfilter Replaced non-standard C use of Variable Length Arrays In Structs (VLAIS) in xt_repldata.h with a C99 compliant flexible array member and then calculated offsets to the other struct members. These other members aren't referenced by name in this code, however this patch maintains the same memory layout and padding as was previously accomplished using VLAIS. Had the original structure been ordered differently, with the entries VLA at the end, then it could have been a flexible member, and this patch would have been a lot simpler. However since the data stored in this structure is ultimately exported to userspace, the order of this structure can't be changed. This patch makes no attempt to change the existing behavior, merely the way in which the current layout is accomplished using standard C99 constructs. As such the code can now be compiled with either gcc or clang. This version of the patch removes the trailing alignment that the VLAIS structure would allocate in order to simplify the patch. Author: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>	2014-06-07 11:44:39 -07:00
David S. Miller	6934e79ed1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/nf_tables fixes for net-next This patchset contains fixes for recent updates available in your net-next, they are: 1) Fix double memory allocation for accounting objects that results in a leak, this slipped through with the new quota extension, patch from Mathieu Poirier. 2) Fix broken ordering when adding set element transactions. 3) Make sure that objects are released in reverse order in the abort path, to avoid possible use-after-free when accessing dependencies. 4) Allow to delete several objects (as long as dependencies are fulfilled) by using one batch. This includes changes in the use counter semantics of the nf_tables objects. 5) Fix illegal sleeping allocation from rcu callback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:35:04 -07:00
WANG Cong	4cb28970a2	net: use the new API kvfree() It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 00:49:51 -07:00
David S. Miller	c99f7abf0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 23:32:12 -07:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
Eric Dumazet	73f156a6e8	inetpeer: get rid of ip_id_count Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 11:00:41 -07:00
Pablo Neira Ayuso	31f8441c32	netfilter: nf_tables: atomic allocation in set notifications from rcu callback Use GFP_ATOMIC allocations when sending removal notifications of anonymous sets from rcu callback context. Sleeping in that context is illegal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:38 +02:00
Pablo Neira Ayuso	4fefee570d	netfilter: nf_tables: allow to delete several objects from a batch Three changes to allow the deletion of several objects with dependencies in one transaction, they are: 1) Introduce speculative counter increment/decrement that is undone in the abort path if required, thus we avoid hitting -EBUSY when deleting the chain. The counter updates are reverted in the abort path. 2) Increment/decrement table/chain use counter for each set/rule. We need this to fully rely on the use counters instead of the list content, eg. !list_empty(&chain->rules) which evaluate true in the middle of the transaction. 3) Decrement table use counter when an anonymous set is bound to the rule in the commit path. This avoids hitting -EBUSY when deleting the table that contains anonymous sets. The anonymous sets are released in the nf_tables_rule_destroy path. This should not be a problem since the rule already bumped the use counter of the chain, so the bound anonymous set reflects dependencies through the rule object, which already increases the chain use counter. So the general assumption after this patch is that the use counters are bumped by direct object dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:35 +02:00
Pablo Neira Ayuso	7632667d26	netfilter: nft_rbtree: introduce locking There's no rbtree rcu version yet, so let's fall back on the spinlock to protect the concurrent access of this structure both from user (to update the set content) and kernel-space (in the packet path). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:31 +02:00
Pablo Neira Ayuso	a1cee076f4	netfilter: nf_tables: release objects in reverse order in the abort path The patch `c7c32e7` ("netfilter: nf_tables: defer all object release via rcu") indicates that we always release deleted objects in the reverse order, but that is only needed in the abort path. These are the two possible scenarios when releasing objects: 1) Deletion scenario in the commit path: no need to release objects in the reverse order since userspace already ensures that dependencies are fulfilled), ie. userspace tells us to delete rule -> ... -> rule -> chain -> table. In this case, we have to release the objects in the same order as userspace provided. 2) Deletion scenario in the abort path: we have to iterate in the reverse order to undo what it cannot be added, ie. userspace sent us a batch that includes: table -> chain -> rule -> ... -> rule, and that needs to be partially undone. In this case, we have to release objects in the reverse order to ensure that the set and chain objects point to valid rule and table objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:28 +02:00
Pablo Neira Ayuso	46bbafceb2	netfilter: nf_tables: fix wrong transaction ordering in set elements The transaction needs to be placed at the end of the commit list, otherwise event notifications are reordered and we may crash when releasing object via call_rcu. This problem was introduced in `60319eb` ("netfilter: nf_tables: use new transaction infrastructure to handle elements"). Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:54:25 +02:00
Mathieu Poirier	4c552a64df	netfilter: nfnetlink_acct: Fix memory leak Allocation of memory need only to happen once, that is after the proper checks on the NFACCT_FLAGS have been done. Otherwise the code can return without freeing already allocated memory. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-02 10:46:52 +02:00
David S. Miller	90d0e08e57	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next This small patchset contains three accumulated Netfilter/IPVS updates, they are: 1) Refactorize common NAT code by encapsulating it into a helper function, similarly to what we do in other conntrack extensions, from Florian Westphal. 2) A minor format string mismatch fix for IPVS, from Masanari Iida. 3) Add quota support to the netfilter accounting infrastructure, now you can add quotas to accounting objects via the nfnetlink interface and use them from iptables. You can also listen to quota notifications from userspace. This enhancement from Mathieu Poirier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:54:47 -07:00
Peter Christensen	f44a5f45f5	ipvs: Fix panic due to non-linear skb Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit `f2edb9f770` ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by: Peter Christensen <pch@ordbogen.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-05-26 10:22:46 +09:00
David S. Miller	54e5c4def0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 00:32:30 -04:00
Daniel Borkmann	b1fcd35cf5	net: filter: let unattached filters use sock_fprog_kern The sk_unattached_filter_create() API is used by BPF filters that are not directly attached or related to sockets, and are used in team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own internal managment of obtaining filter blocks and thus already have them in kernel memory and set up before calling into sk_unattached_filter_create(). As a result, due to __user annotation in sock_fprog, sparse triggers false positives (incorrect type in assignment [different address space]) when filters are set up before passing them to sk_unattached_filter_create(). Therefore, let sk_unattached_filter_create() API use sock_fprog_kern to overcome this issue. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-23 16:48:05 -04:00
David S. Miller	8af750d739	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== Netfilter/nftables updates for net-next The following patchset contains Netfilter/nftables updates for net-next, most relevantly they are: 1) Add set element update notification via netlink, from Arturo Borrero. 2) Put all object updates in one single message batch that is sent to kernel-space. Before this patch only rules where included in the batch. This series also introduces the generic transaction infrastructure so updates to all objects (tables, chains, rules and sets) are applied in an all-or-nothing fashion, these series from me. 3) Defer release of objects via call_rcu to reduce the time required to commit changes. The assumption is that all objects are destroyed in reverse order to ensure that dependencies betweem them are fulfilled (ie. rules and sets are destroyed first, then chains, and finally tables). 4) Allow to match by bridge port name, from Tomasz Bursztyka. This series include two patches to prepare this new feature. 5) Implement the proper set selection based on the characteristics of the data. The new infrastructure also allows you to specify your preferences in terms of memory and computational complexity so the underlying set type is also selected according to your needs, from Patrick McHardy. 6) Several cleanup patches for nft expressions, including one minor possible compilation breakage due to missing mark support, also from Patrick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 12:06:23 -04:00
Pablo Neira Ayuso	c7c32e72cb	netfilter: nf_tables: defer all object release via rcu Now that all objects are released in the reverse order via the transaction infrastructure, we can enqueue the release via call_rcu to save one synchronize_rcu. For small rule-sets loaded via nft -f, it now takes around 50ms less here. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso	128ad3322b	netfilter: nf_tables: remove skb and nlh from context structure Instead of caching the original skbuff that contains the netlink messages, this stores the netlink message sequence number, the netlink portID and the report flag. This helps to prepare the introduction of the object release via call_rcu. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso	35151d840c	netfilter: nf_tables: simplify nf_tables__notify Now that all these function are called from the commit path, we can pass the context structure to reduce the amount of parameters in all of the nf_tables__notify functions. This patch also removes unneeded branches to check for skb, nlh and net that should be always set in the context structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso	60319eb1ca	netfilter: nf_tables: use new transaction infrastructure to handle elements Leave the set content in consistent state if we fail to load the batch. Use the new generic transaction infrastructure to achieve this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso	55dd6f9307	netfilter: nf_tables: use new transaction infrastructure to handle table This patch speeds up rule-set updates and it also provides a way to revert updates and leave things in consistent state in case that the batch needs to be aborted. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso	e1aaca93ee	netfilter: nf_tables: pass context to nf_tables_updtable() So nf_tables_uptable() only takes one single parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	f75edf5e9c	netfilter: nf_tables: disabling table hooks always succeeds nf_tables_table_disable() always succeeds, make this function void. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	91c7b38dc9	netfilter: nf_tables: use new transaction infrastructure to handle chain This patch speeds up rule-set updates and it also introduces a way to revert chain updates if the batch is aborted. The idea is to store the changes in the transaction to apply that in the commit step. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	ff3cd7b3c9	netfilter: nf_tables: refactor chain statistic routines Add new routines to encapsulate chain statistics allocation and replacement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso	958bee14d0	netfilter: nf_tables: use new transaction infrastructure to handle sets This patch reworks the nf_tables API so set updates are included in the same batch that contains rule updates. This speeds up rule-set updates since we skip a dialog of four messages between kernel and user-space (two on each direction), from: 1) create the set and send netlink message to the kernel 2) process the response from the kernel that contains the allocated name. 3) add the set elements and send netlink message to the kernel. 4) process the response from the kernel (to check for errors). To: 1) add the set to the batch. 2) add the set elements to the batch. 3) add the rule that points to the set. 4) send batch to the kernel. This also introduces an internal set ID (NFTA_SET_ID) that is unique in the batch so set elements and rules can refer to new sets. Backward compatibility has been only retained in userspace, this means that new nft versions can talk to the kernel both in the new and the old fashion. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	b380e5c733	netfilter: nf_tables: add message type to transactions The patch adds message type to the transaction to simplify the commit the and abort routines. Yet another step forward in the generalisation of the transaction infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	37082f930b	netfilter: nf_tables: relocate commit and abort routines in the source file Move the commit and abort routines to the bottom of the source code file. This change is required by the follow up patches that add the set, chain and table transaction support. This patch is just a cleanup to access several functions without having to declare their prototypes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	1081d11b08	netfilter: nf_tables: generalise transaction infrastructure This patch generalises the existing rule transaction infrastructure so it can be used to handle set, table and chain object transactions as well. The transaction provides a data area that stores private information depending on the transaction type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso	7c95f6d866	netfilter: nf_tables: deconstify table and chain in context structure The new transaction infrastructure updates the family, table and chain objects in the context structure, so let's deconstify them. While at it, move the context structure initialization routine to the top of the source file as it will be also used from the table and chain routines. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-19 12:06:09 +02:00
Pablo Neira Ayuso	3b084e99a3	netfilter: nf_tables: fix trace of matching non-terminal rule Add the corresponding trace if we have a full match in a non-terminal rule. Note that the traces will look slightly different than in x_tables since the log message after all expressions have been evaluated (contrary to x_tables, that emits it before the target action). This manifests in two differences in nf_tables wrt. x_tables: 1) The rule that enables the tracing is included in the trace. 2) If the rule emits some log message, that is shown before the trace log message. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-15 19:44:20 +02:00
WANG Cong	60ff746739	net: rename local_df to ignore_df As suggested by several people, rename local_df to ignore_df, since it means "ignore df bit if it is set". Cc: Maciej Żenczykowski <maze@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 14:03:41 -04:00
David S. Miller	5f013c9bc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 13:19:14 -04:00
Pablo Neira Ayuso	7e9bc10db2	netfilter: nf_tables: fix missing return trace at the end of non-base chain Display "return" for implicit rule at the end of a non-base chain, instead of when popping chain from the stack. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:33:11 +02:00
Pablo Neira Ayuso	f7e7e39b21	netfilter: nf_tables: fix bogus rulenum after goto action After returning from the chain that we just went to with no matchings, we get a bogus rule number in the trace. To fix this, we would need to iterate over the list of remaining rules in the chain to update the rule number counter. Patrick suggested to set this to the maximum value since the default base chain policy is the very last action when the processing the base chain is over. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:33:10 +02:00
Pablo Neira Ayuso	7b9d5ef932	netfilter: nf_tables: fix tracing of the goto action Add missing code to trace goto actions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:33:08 +02:00
Pablo Neira Ayuso	5467a51221	netfilter: nf_tables: fix goto action This patch fixes a crash when trying to access the counters and the default chain policy from the non-base chain that we have reached via the goto chain. Fix this by falling back on the original base chain after returning from the custom chain. While fixing this, kill the inline function to account chain statistics to improve source code readability. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-12 16:32:41 +02:00
Pablo Neira Ayuso	d088be8042	netfilter: nf_tables: reset rule number counter after jump and goto Otherwise we start incrementing the rule number counter from the previous chain iteration. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-10 19:12:04 +02:00
Denys Fedoryshchenko	ecd15dd7e4	netfilter: nfnetlink: Fix use after free when it fails to process batch This bug manifests when calling the nft command line tool without nf_tables kernel support. kernel message: [ 44.071555] Netfilter messages via NETLINK v0.30. [ 44.072253] BUG: unable to handle kernel NULL pointer dereference at 0000000000000119 [ 44.072264] IP: [<ffffffff8171db1f>] netlink_getsockbyportid+0xf/0x70 [ 44.072272] PGD 7f2b74067 PUD 7f2b73067 PMD 0 [ 44.072277] Oops: 0000 [#1] SMP [...] [ 44.072369] Call Trace: [ 44.072373] [<ffffffff8171fd81>] netlink_unicast+0x91/0x200 [ 44.072377] [<ffffffff817206c9>] netlink_ack+0x99/0x110 [ 44.072381] [<ffffffffa004b951>] nfnetlink_rcv+0x3c1/0x408 [nfnetlink] [ 44.072385] [<ffffffff8171fde3>] netlink_unicast+0xf3/0x200 [ 44.072389] [<ffffffff817201ef>] netlink_sendmsg+0x2ff/0x740 [ 44.072394] [<ffffffff81044752>] ? __mmdrop+0x62/0x90 [ 44.072398] [<ffffffff816dafdb>] sock_sendmsg+0x8b/0xc0 [ 44.072403] [<ffffffff812f1af5>] ? copy_user_enhanced_fast_string+0x5/0x10 [ 44.072406] [<ffffffff816dbb6c>] ? move_addr_to_kernel+0x2c/0x50 [ 44.072410] [<ffffffff816db423>] ___sys_sendmsg+0x3c3/0x3d0 [ 44.072415] [<ffffffff811301ba>] ? handle_mm_fault+0xa9a/0xc60 [ 44.072420] [<ffffffff811362d6>] ? mmap_region+0x166/0x5a0 [ 44.072424] [<ffffffff817da84c>] ? __do_page_fault+0x1dc/0x510 [ 44.072428] [<ffffffff812b8b2c>] ? apparmor_capable+0x1c/0x60 [ 44.072435] [<ffffffff817d6e9a>] ? _raw_spin_unlock_bh+0x1a/0x20 [ 44.072439] [<ffffffff816dfc86>] ? release_sock+0x106/0x150 [ 44.072443] [<ffffffff816dc212>] __sys_sendmsg+0x42/0x80 [ 44.072446] [<ffffffff816dc262>] SyS_sendmsg+0x12/0x20 [ 44.072450] [<ffffffff817df616>] system_call_fastpath+0x1a/0x1f Signed-off-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-04 15:14:08 +02:00
Florian Westphal	f768e5bdef	netfilter: add helper for adding nat extension Reduce copy-past a bit by adding a common helper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-29 20:56:22 +02:00
Florian Westphal	fe337ac283	netfilter: ctnetlink: don't add null bindings if no nat requested commit `0eba801b64` tried to fix a race where nat initialisation can happen after ctnetlink-created conntrack has been created. However, it causes the nat module(s) to be loaded needlessly on systems that are not using NAT. Fortunately, we do not have to create null bindings in that case. conntracks injected via ctnetlink always have the CONFIRMED bit set, which prevents addition of the nat extension in nf_nat_ipv4/6_fn(). We only need to make sure that either no nat extension is added or that we've created both src and dst manips. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-29 20:49:08 +02:00
Mathieu Poirier	683399eddb	netfilter: nfnetlink_acct: Adding quota support to accounting framework nfacct objects already support accounting at the byte and packet level. As such it is a natural extension to add the possiblity to define a ceiling limit for both metrics. All the support for quotas itself is added to nfnetlink acctounting framework to stay coherent with current accounting object management. Quota limit checks are implemented in xt_nfacct filter where statistic collection is already done. Pablo Neira Ayuso has also contributed to this feature. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-29 18:25:14 +02:00
Pablo Neira	4c1f7818e4	netfilter: nf_tables: relax string validation of NFTA_CHAIN_TYPE Use NLA_STRING for consistency with other string attributes in nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-28 16:54:15 +02:00
David S. Miller	a64d90fd96	netfilter: Fix warning in nfnetlink_receive(). net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’: net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable] Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:51:29 -04:00
Eric W. Biederman	90f62cf30a	net: Use netlink_ns_capable to verify the permisions of netlink messages It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Masanari Iida	1404c3ab98	netfilter: Fix format string mismatch in ip_vs_proto_name() Fix string mismatch in ip_vs_proto_name() Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 14:14:14 +02:00
Tomasz Bursztyka	aa45660c6b	netfilter: nf_tables: Make meta expression core functions public This will be useful to create network family dedicated META expression as for NFPROTO_BRIDGE for instance. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 13:55:30 +02:00
Tomasz Bursztyka	758dbcecf1	netfilter: nf_tables: Stack expression type depending on their family To ensure family tight expression gets selected in priority to family agnostic ones. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-23 13:51:05 +02:00
Richard Guy Briggs	4f52090052	netlink: have netlink per-protocol bind function return an error code. Have the netlink per-protocol optional bind function return an int error code rather than void to signal a failure. This will enable netlink protocols to perform extra checks including capabilities and permissions verifications when updating memberships in multicast groups. In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind function was moved above the multicast group update to prevent any access to the multicast socket groups before checking with the per-protocol bind function. This will enable the per-protocol bind function to be used to check permissions which could be denied before making them available, and to avoid the messy job of undoing the addition should the per-protocol bind function fail. The netfilter subsystem seems to be the only one currently using the per-protocol bind function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Richard Guy Briggs	bfe4bc71c6	netlink: simplify nfnetlink_bind Remove duplicity and simplify code flow by moving the rcu_read_unlock() above the condition and let the flow control exit naturally at the end of the function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:42:26 -04:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00
Patrick McHardy	b855d416dc	netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4 nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:38:02 +02:00
Andrey Vagin	ee214d54bf	netfilter: nf_conntrack: initialize net.ct.generation [ 251.920788] INFO: trying to register non-static key. [ 251.921386] the code is fine but needs lockdep annotation. [ 251.921386] turning off the locking correctness validator. [ 251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294 [ 251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 251.921386] 0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd [ 251.921386] ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0 [ 251.921386] ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172 [ 251.921386] Call Trace: [ 251.921386] [<ffffffff816b7ecd>] dump_stack+0x45/0x56 [ 251.921386] [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c [ 251.921386] [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40 [ 251.921386] [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat] [ 251.921386] [<ffffffff810c7272>] lock_acquire+0xa2/0x120 [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack] [ 251.921386] [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4] [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190 [ 251.921386] [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0 [ 251.921386] [<ffffffff815e98f2>] ip_output+0x92/0x100 [ 251.921386] [<ffffffff815e8df9>] ip_local_out+0x29/0x90 [ 251.921386] [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0 [ 251.921386] [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0 [ 251.921386] [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960 [ 251.921386] [<ffffffff81602d82>] tcp_connect+0x812/0x960 [ 251.921386] [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70 [ 251.921386] [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0 [ 251.921386] [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470 [ 251.921386] [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330 [ 251.921386] [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0 [ 251.921386] [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50 [ 251.921386] [<ffffffff8158b157>] SYSC_connect+0xe7/0x120 [ 251.921386] [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0 [ 251.921386] [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 251.921386] [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10 [ 251.921386] [<ffffffff8158c36e>] SyS_connect+0xe/0x10 [ 251.921386] [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b [ 312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333) [ 312.015097] INFO: Stall ended before state dump start Fixes: `93bb0ceb75` ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:35:28 +02:00
Patrick McHardy	60eb18943b	netfilter: nf_tables: handle more than 8 * PAGE_SIZE set name allocations We currently have a limit of 8 * PAGE_SIZE anonymous sets. Lift that limit by continuing the scan if the entire page is exhausted. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-14 10:31:20 +02:00
Andrey Vagin	8142b227ef	netfilter: nf_conntrack: flush net_gre->keymap_list only from gre helper nf_ct_gre_keymap_flush() removes a nf_ct_gre_keymap object from net_gre->keymap_list and frees the object. But it doesn't clean a reference on this object from ct_pptp_info->keymap[dir]. Then nf_ct_gre_keymap_destroy() may release the same object again. So nf_ct_gre_keymap_flush() can be called only when we are sure that when nf_ct_gre_keymap_destroy will not be called. nf_ct_gre_keymap is created by nf_ct_gre_keymap_add() and the right way to destroy it is to call nf_ct_gre_keymap_destroy(). This patch marks nf_ct_gre_keymap_flush() as static, so this patch can break compilation of third party modules, which use nf_ct_gre_keymap_flush. I'm not sure this is the right way to deprecate this function. [ 226.540793] general protection fault: 0000 [#1] SMP [ 226.541750] Modules linked in: nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre ip_gre ip_tunnel gre ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth tun bridge stp llc ppdev microcode joydev pcspkr serio_raw virtio_console virtio_balloon floppy parport_pc parport pvpanic i2c_piix4 virtio_net drm_kms_helper ttm ata_generic virtio_pci virtio_ring virtio drm i2c_core pata_acpi [last unloaded: ip_tunnel] [ 226.541776] CPU: 0 PID: 49 Comm: kworker/u4:2 Not tainted 3.14.0-rc8+ #101 [ 226.541776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 226.541776] Workqueue: netns cleanup_net [ 226.541776] task: ffff8800371e0000 ti: ffff88003730c000 task.ti: ffff88003730c000 [ 226.541776] RIP: 0010:[<ffffffff81389ba9>] [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0 [ 226.541776] RSP: 0018:ffff88003730dbd0 EFLAGS: 00010a83 [ 226.541776] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800374e6c40 RCX: dead000000200200 [ 226.541776] RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800371e07d0 RDI: ffff8800374e6c40 [ 226.541776] RBP: ffff88003730dbd0 R08: 0000000000000000 R09: 0000000000000000 [ 226.541776] R10: 0000000000000001 R11: ffff88003730d92e R12: 0000000000000002 [ 226.541776] R13: ffff88007a4c42d0 R14: ffff88007aef0000 R15: ffff880036cf0018 [ 226.541776] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 226.541776] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 226.541776] CR2: 00007f07f643f7d0 CR3: 0000000036fd2000 CR4: 00000000000006f0 [ 226.541776] Stack: [ 226.541776] ffff88003730dbe8 ffffffff81389c5d ffff8800374ffbe4 ffff88003730dc28 [ 226.541776] ffffffffa0162a43 ffffffffa01627c5 ffff88007a4c42d0 ffff88007aef0000 [ 226.541776] ffffffffa01651c0 ffff88007a4c45e0 ffff88007aef0000 ffff88003730dc40 [ 226.541776] Call Trace: [ 226.541776] [<ffffffff81389c5d>] list_del+0xd/0x30 [ 226.541776] [<ffffffffa0162a43>] nf_ct_gre_keymap_destroy+0x283/0x2d0 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa01627c5>] ? nf_ct_gre_keymap_destroy+0x5/0x2d0 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa0162ab7>] gre_destroy+0x27/0x70 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffffa0117de3>] destroy_conntrack+0x83/0x200 [nf_conntrack] [ 226.541776] [<ffffffffa0117d87>] ? destroy_conntrack+0x27/0x200 [nf_conntrack] [ 226.541776] [<ffffffffa0117d60>] ? nf_conntrack_hash_check_insert+0x2e0/0x2e0 [nf_conntrack] [ 226.541776] [<ffffffff81630142>] nf_conntrack_destroy+0x72/0x180 [ 226.541776] [<ffffffff816300d5>] ? nf_conntrack_destroy+0x5/0x180 [ 226.541776] [<ffffffffa011ef80>] ? kill_l3proto+0x20/0x20 [nf_conntrack] [ 226.541776] [<ffffffffa011847e>] nf_ct_iterate_cleanup+0x14e/0x170 [nf_conntrack] [ 226.541776] [<ffffffffa011f74b>] nf_ct_l4proto_pernet_unregister+0x5b/0x90 [nf_conntrack] [ 226.541776] [<ffffffffa0162409>] proto_gre_net_exit+0x19/0x30 [nf_conntrack_proto_gre] [ 226.541776] [<ffffffff815edf89>] ops_exit_list.isra.1+0x39/0x60 [ 226.541776] [<ffffffff815eecc0>] cleanup_net+0x100/0x1d0 [ 226.541776] [<ffffffff810a608a>] process_one_work+0x1ea/0x4f0 [ 226.541776] [<ffffffff810a6028>] ? process_one_work+0x188/0x4f0 [ 226.541776] [<ffffffff810a64ab>] worker_thread+0x11b/0x3a0 [ 226.541776] [<ffffffff810a6390>] ? process_one_work+0x4f0/0x4f0 [ 226.541776] [<ffffffff810af42d>] kthread+0xed/0x110 [ 226.541776] [<ffffffff8173d4dc>] ? _raw_spin_unlock_irq+0x2c/0x40 [ 226.541776] [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200 [ 226.541776] [<ffffffff8174747c>] ret_from_fork+0x7c/0xb0 [ 226.541776] [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200 [ 226.541776] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 [ 226.541776] RIP [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0 [ 226.541776] RSP <ffff88003730dbd0> [ 226.612193] ---[ end trace 985ae23ddfcc357c ]--- Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-08 10:56:12 +02:00
Pablo Neira Ayuso	2fec6bb6f4	netfilter: nf_tables: fix wrong format in request_module() The intended format in request_module is %.s instead of %.s. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:50 +02:00
Pablo Neira Ayuso	a9bdd83656	netfilter: nf_tables: set names cannot be larger than 15 bytes Currently, nf_tables trims off the set name if it exceeeds 15 bytes, so explicitly reject set names that are too large. Reported-by: Giuseppe Longo <giuseppelng@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:44 +02:00
Kirill Tkhai	b8ddd9eac8	netfilter: Add {ipt,ip6t}_osf aliases for xt_osf There are no these aliases, so kernel can not request appropriate match table: $ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP iptables: No chain/target/match by that name. setsockopt() requests ipt_osf module, which is not present. Add the aliases. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:22 +02:00
Alexey Perevalov	a00e76349f	netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks This simple modification allows iptables to work with INPUT chain in combination with cgroup module. It could be useful for counting ingress traffic per cgroup with nfacct netfilter module. There were no problems to count the egress traffic that way formerly. It's possible to get classified sk_buff after PREROUTING, due to socket lookup being done in early_demux (tcp_v4_early_demux). Also it works for udp as well. Trivial usage example, assuming we're in the same shell every step and we have enough permissions: 1) Classic net_cls cgroup initialization: mkdir /sys/fs/cgroup/net_cls mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls 2) Set up cgroup for interesting application: mkdir /sys/fs/cgroup/net_cls/wget echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs 3) Create kernel counters: nfacct add wget-cgroup-in iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in nfacct add wget-cgroup-out iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out 4) Network usage: wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz 5) Check results: nfacct list Cgroup approach is being used for the DataUsage (counting & blocking traffic) feature for Samsung's modification of the Tizen OS. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:17 +02:00
Florian Westphal	e00b437b3d	netfilter: connlimit: move lock array out of struct connlimit_data Eric points out that the locks can be global. Moreover, both Jesper and Eric note that using only 32 locks increases false sharing as only two cache lines are used. This increases locks to 256 (16 cache lines assuming 64byte cacheline and 4 bytes per spinlock). Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:13 +02:00
Florian Westphal	e5ac6eafba	netfilter: connlimit: fix UP build cannot use ARRAY_SIZE() if spinlock_t is empty struct. Fixes: `1442e7507d` ("netfilter: connlimit: use keyed locks") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-03 23:52:07 +02:00
Arturo Borrero	d60ce62fb5	netfilter: nf_tables: add set_elem notifications This patch adds set_elems notifications. When a set_elem is added/deleted, all listening peers in userspace will receive the corresponding notification. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>	2014-04-03 12:22:25 +02:00
Patrick McHardy	2c96c25d11	netfilter: nft_hash: use set global element counter instead of private one Now that nf_tables performs global accounting of set elements, it is not needed in the hash type anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:33:55 +02:00
Patrick McHardy	c50b960ccc	netfilter: nf_tables: implement proper set selection The current set selection simply choses the first set type that provides the requested features, which always results in the rbtree being chosen by virtue of being the first set in the list. What we actually want to do is choose the implementation that can provide the requested features and is optimal from either a performance or memory perspective depending on the characteristics of the elements and the preferences specified by the user. The elements are not known when creating a set. Even if we would provide them for anonymous (literal) sets, we'd still have standalone sets where the elements are not known in advance. We therefore need an abstract description of the data charcteristics. The kernel already knows the size of the key, this patch starts by introducing a nested set description which so far contains only the maximum amount of elements. Based on this the set implementations are changed to provide an estimate of the required amount of memory and the lookup complexity class. The set ops have a new callback ->estimate() that is invoked during set selection. It receives a structure containing the attributes known to the kernel and is supposed to populate a struct nft_set_estimate with the complexity class and, in case the size is known, the complete amount of memory required, or the amount of memory required per element otherwise. Based on the policy specified by the user (performance/memory, defaulting to performance) the kernel will then select the best suited implementation. Even if the set implementation would allow to add more than the specified maximum amount of elements, they are enforced since new implementations might not be able to add more than maximum based on which they were selected. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:32:57 +02:00
Patrick McHardy	fe92ca45a1	netfilter: nft_ct: split nft_ct_init() into two functions for get/set For value spanning multiple registers, we need to validate the length of data loads. In order to add this to nft_ct, we need the length from key validation. Split the nft_ct_init() function into two functions for the get and set operations as preparation for that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:29:45 +02:00
Patrick McHardy	d2caa696ad	netfilter: nft_meta: split nft_meta_init() into two functions for get/set For value spanning multiple registers, we need to validate the length of data loads. In order to add this to nft_meta, we need the length from key validation. Split the nft_meta_init() function into two functions for the get and set operations as preparation for that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>	2014-04-02 21:29:25 +02:00
Patrick McHardy	e88e514e1f	netfilter: nft_ct: add missing ifdef for NFT_MARK setting The set operation for ct mark is only valid if CONFIG_NF_CONNTRACK_MARK is enabled. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-02 21:29:07 +02:00
David S. Miller	64c27237a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/marvell/mvneta.c The mvneta.c conflict is a case of overlapping changes, a conversion to devm_ioremap_resource() vs. a conversion to netdev_alloc_pcpu_stats. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-29 18:48:54 -04:00
Zoltan Kiss	36d5fe6a00	core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 15:29:38 -04:00
David S. Miller	3ab428a4c5	netfilter: Add missing vmalloc.h include to nft_hash.c Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-18 23:12:02 -04:00
Eric Dumazet	d5d20912d3	netfilter: conntrack: Fix UP builds ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an empty structure. Replace it by CONNTRACK_LOCKS Fixes: `93bb0ceb75` ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-17 17:14:45 -04:00
David S. Miller	e86e180b82	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: * cleanup to remove double semicolon from stephen hemminger. * calm down sparse warning in xt_ipcomp, from Fan Du. * nf_ct_labels support for nf_tables, from Florian Westphal. * new macros to simplify rcu dereferences in the scope of nfnetlink and nf_tables, from Patrick McHardy. * Accept queue and drop (including reason for drop) to verdict parsing in nf_tables, also from Patrick. * Remove unused random seed initialization in nfnetlink_log, from Florian Westphal. * Allow to attach user-specific information to nf_tables rules, useful to attach user comments to rule, from me. * Return errors in ipset according to the manpage documentation, from Jozsef Kadlecsik. * Fix coccinelle warnings related to incorrect bool type usage for ipset, from Fengguang Wu. * Add hash:ip,mark set type to ipset, from Vytas Dauksa. * Fix message for each spotted by ipset for each netns that is created, from Ilia Mirkin. * Add forceadd option to ipset, which evicts a random entry from the set if it becomes full, from Josh Hunt. * Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu. * Improve conntrack scalability by removing a central spinlock, original work from Eric Dumazet. Jesper Dangaard Brouer took them over to address remaining issues. Several patches to prepare this change come in first place. * Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization on element removal, etc. from Patrick McHardy. * Restore context in the rule deletion path, as we now release rule objects synchronously, from Patrick McHardy. This gets back event notification for anonymous sets. * Fix NAT family validation in nft_nat, also from Patrick. * Improve scalability of xt_connlimit by using an array of spinlocks and by introducing a rb-tree of hashtables for faster lookup of accounted objects per network. This patch was preceded by several patches and refactorizations to accomodate this change including the use of kmem_cache, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-17 15:06:24 -04:00
Florian Westphal	7d08487777	netfilter: connlimit: use rbtree for per-host conntrack obj storage With current match design every invocation of the connlimit_match function means we have to perform (number_of_conntracks % 256) lookups in the conntrack table [ to perform GC/delete stale entries ]. This is also the reason why ____nf_conntrack_find() in perf top has > 20% cpu time per core. This patch changes the storage to rbtree which cuts down the number of ct objects that need testing. When looking up a new tuple, we only test the connections of the host objects we visit while searching for the wanted host/network (or the leaf we need to insert at). The slot count is reduced to 32. Increasing slot count doesn't speed up things much because of rbtree nature. before patch (50kpps rx, 10kpps tx): + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw after (90kpps, 51kpps tx): + 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find + 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw + 2.36% swapper [xt_connlimit] [k] count_tree Obvious disadvantages to previous version are the increase in code complexity and the increased memory cost. Partially based on Eric Dumazets fq scheduler. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-17 11:11:57 +01:00
Florian Westphal	50e0e9b129	netfilter: connlimit: make same_source_net signed currently returns 1 if they're the same. Make it work like mem/strcmp so it can be used as rbtree search function. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-17 11:11:52 +01:00
Florian Westphal	1442e7507d	netfilter: connlimit: use keyed locks connlimit currently suffers from spinlock contention, example for 4-core system with rps enabled: + 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw May allow parallel lookup/insert/delete if the entry is hashed to another slot. With patch: + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock Improved rx processing rate from ~35kpps to ~50 kpps. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-17 11:11:49 +01:00
Eric W. Biederman	57a7744e09	net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:41:36 -04:00
Joe Perches	b80edf0b52	netfilter: Convert uses of __constant_<foo> to <foo> The use of __constant_<foo> has been unnecessary for quite awhile now. Make these uses consistent with the rest of the kernel. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-13 14:13:19 +01:00
Florian Westphal	14e1a97776	netfilter: connlimit: use kmem_cache for conn objects We might allocate thousands of these (one object per connection). Use distinct kmem cache to permit simplte tracking on how many objects are currently used by the connlimit match via the sysfs. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-12 13:55:03 +01:00
Florian Westphal	3bcc5fdf1b	netfilter: connlimit: move insertion of new element out of count function Allows easier code-reuse in followup patches. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-12 13:55:02 +01:00
Florian Westphal	d9ec4f1ee2	netfilter: connlimit: improve packet-to-closed-connection logic Instead of freeing the entry from our list and then adding it back again in the 'packet to closing connection' case just keep the matching entry around. Also drop the found_ct != NULL test as nf_ct_tuplehash_to_ctrack is just container_of(). Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-12 13:55:01 +01:00
Florian Westphal	15cfd52895	netfilter: connlimit: factor hlist search into new function Simplifies followup patch that introduces separate locks for each of the hash slots. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-12 13:55:01 +01:00
Patrick McHardy	a4c2e8beba	netfilter: nft_nat: fix family validation The family in the NAT expression is basically completely useless since we have it available during runtime anyway. Nevertheless it is used to decide the NAT family, so at least validate it properly. As we don't support cross-family NAT, it needs to match the family of the table the expression exists in. Unfortunately we can't remove it completely since we need to dump it for userspace (sigh), so at least reduce the memory waste. Additionally clean up the module init function by removing useless temporary variables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-08 12:35:19 +01:00
Patrick McHardy	d46f2cd260	netfilter: nft_ct: remove family from struct nft_ct Since we have the context available during destruction again, we can remove the family from the private structure. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-08 12:35:19 +01:00
Patrick McHardy	ab9da5c19f	netfilter: nf_tables: restore notifications for anonymous set destruction Since we have the context available again, we can restore notifications for destruction of anonymous sets. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-08 12:35:18 +01:00
Patrick McHardy	62472bcefb	netfilter: nf_tables: restore context for expression destructors In order to fix set destruction notifications and get rid of unnecessary members in private data structures, pass the context to expressions' destructor functions again. In order to do so, replace various members in the nft_rule_trans structure by the full context. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-08 12:35:17 +01:00
Patrick McHardy	a36e901cf6	netfilter: nf_tables: clean up nf_tables_trans_add() argument order The context argument logically comes first, and this is what every other function dealing with contexts does. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-08 12:35:16 +01:00
Patrick McHardy	ce6eb0d7c8	netfilter: nft_hash: bug fixes and resizing The hash set type is very broken and was never meant to be merged in this state. Missing RCU synchronization on element removal, leaking chain refcounts when used as a verdict map, races during lookups, a fixed table size are probably just some of the problems. Luckily it is currently never chosen by the kernel when the rbtree type is also available. Rewrite it to be usable. The new implementation supports automatic hash table resizing using RCU, based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing For RCU-Protected Hash Tables" described in [1]. Resizing doesn't require a second list head in the elements, it works by chosing a hash function that remaps elements to a predictable set of buckets, only resizing by integral factors and - during expansion: linking new buckets to the old bucket that contains elements for any of the new buckets, thereby creating imprecise chains, then incrementally seperating the elements until the new buckets only contain elements that hash directly to them. - during shrinking: linking the hash chains of all old buckets that hash to the same new bucket to form a single chain. Expansion requires at most the number of elements in the longest hash chain grace periods, shrinking requires a single grace period. Due to the requirement of having hash chains/elements linked to multiple buckets during resizing, homemade single linked lists are used instead of the existing list helpers, that don't support this in a clean fashion. As a side effect, the amount of memory required per element is reduced by one pointer. Expansion is triggered when the load factors exceeds 75%, shrinking when the load factor goes below 30%. Both operations are allowed to fail and will be retried on the next insertion or removal if their respective conditions still hold. [1] http://dl.acm.org/citation.cfm?id=2002181.2002192 Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:42:07 +01:00
Jesper Dangaard Brouer	93bb0ceb75	netfilter: conntrack: remove central spinlock nf_conntrack_lock nf_conntrack_lock is a monolithic lock and suffers from huge contention on current generation servers (8 or more core/threads). Perf locking congestion is clear on base kernel: - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh - _raw_spin_lock_bh + 25.33% init_conntrack + 24.86% nf_ct_delete_from_lists + 24.62% __nf_conntrack_confirm + 24.38% destroy_conntrack + 0.70% tcp_packet + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table This patch change conntrack locking and provides a huge performance improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen): Base kernel: 810.405 new conntrack/sec After patch: 2.233.876 new conntrack/sec Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: # iptables -A INPUT -m state --state INVALID -j DROP # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 Use an array of hashed spinlocks to protect insertions/deletions of conntracks into the hash table. 1024 spinlocks seem to give good results, at minimal cost (4KB memory). Due to lockdep max depth, 1024 becomes 8 if CONFIG_LOCKDEP=y The hash resize is a bit tricky, because we need to take all locks in the array. A seqcount_t is used to synchronize the hash table users with the resizing process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:41:13 +01:00
Jesper Dangaard Brouer	ca7433df3a	netfilter: conntrack: seperate expect locking from nf_conntrack_lock Netfilter expectations are protected with the same lock as conntrack entries (nf_conntrack_lock). This patch split out expectations locking to use it's own lock (nf_conntrack_expect_lock). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:41:01 +01:00
Jesper Dangaard Brouer	e1b207dac1	netfilter: avoid race with exp->master ct Preparation for disconnecting the nf_conntrack_lock from the expectations code. Once the nf_conntrack_lock is lifted, a race condition is exposed. The expectations master conntrack exp->master, can race with delete operations, as the refcnt increment happens too late in init_conntrack(). Race is against other CPUs invoking ->destroy() (destroy_conntrack()), or nf_ct_delete() (via timeout or early_drop()). Avoid this race in nf_ct_find_expectation() by using atomic_inc_not_zero(), and checking if nf_ct_is_dying() (path via nf_ct_delete()). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:40:47 +01:00
Jesper Dangaard Brouer	b7779d06f9	netfilter: conntrack: spinlock per cpu to protect special lists. One spinlock per cpu to protect dying/unconfirmed/template special lists. (These lists are now per cpu, a bit like the untracked ct) Add a @cpu field to nf_conn, to make sure we hold the appropriate spinlock at removal time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:40:38 +01:00
Jesper Dangaard Brouer	b476b72a0f	netfilter: trivial code cleanup and doc changes Changes while reading through the netfilter code. Added hint about how conntrack nf_conn refcnt is accessed. And renamed repl_hash to reply_hash for readability Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:40:04 +01:00
Pablo Neira Ayuso	52af2bfcc0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Via Simon Horman: ==================== * Whitespace cleanup spotted by checkpatch.pl from Tingwei Liu. * Section conflict cleanup, basically removal of one wrong __read_mostly, from Andi Kleen. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-03-07 11:37:11 +01:00
Tingwei Liu	411fd527bc	ipvs: Reduce checkpatch noise in ip_vs_lblc.c Add whitespace after operator and put open brace { on the previous line Cc: Tingwei Liu <liutingwei@hisense.com> Cc: lvs-devel@vger.kernel.org Signed-off-by: Tingwei Liu <tingw.liu@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-03-07 12:36:21 +09:00
Andi Kleen	c61b0c1328	sections, ipvs: Remove useless __read_mostly for ipvs genl_ops const __read_mostly does not make any sense, because const data is already read-only. Remove the __read_mostly for the ipvs genl_ops. This avoids a LTO section conflict compile problem. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Patrick McHardy <kaber@trash.net> Cc: lvs-devel@vger.kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-03-07 12:36:21 +09:00
Josh Hunt	07cf8f5ae2	netfilter: ipset: add forceadd kernel support for hash set types Adds a new property for hash set types, where if a set is created with the 'forceadd' option and the set becomes full the next addition to the set may succeed and evict a random entry from the set. To keep overhead low eviction is done very simply. It checks to see which bucket the new entry would be added. If the bucket's pos value is non-zero (meaning there's at least one entry in the bucket) it replaces the first entry in the bucket. If pos is zero, then it continues down the normal add process. This property is useful if you have a set for 'ban' lists where it may not matter if you release some entries from the set early. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-03-06 09:31:43 +01:00
Ilia Mirkin	6843bc3c56	netfilter: ipset: move registration message to init from net_init Commit `1785e8f473` ("netfiler: ipset: Add net namespace for ipset") moved the initialization print into net_init, which can get called a lot due to namespaces. Move it back into init, reduce to pr_info. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-03-06 09:31:43 +01:00
Vytas Dauksa	4d0e5c076d	netfilter: ipset: add markmask for hash:ip,mark data type Introduce packet mark mask for hash:ip,mark data type. This allows to set mark bit filter for the ip set. Change-Id: Id8dd9ca7e64477c4f7b022a1d9c1a5b187f1c96e Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-03-06 09:31:42 +01:00
Vytas Dauksa	3b02b56cd5	netfilter: ipset: add hash:ip,mark data type to ipset Introduce packet mark support with new ip,mark hash set. This includes userspace and kernelspace code, hash:ip,mark set tests and man page updates. The intended use of ip,mark set is similar to the ip:port type, but for protocols which don't use a predictable port number. Instead of port number it matches a firewall mark determined by a layer 7 filtering program like opendpi. As well as allowing or blocking traffic it will also be used for accounting packets and bytes sent for each protocol. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-03-06 09:31:42 +01:00
Fengguang Wu	9562cf28d1	netfilter: ipset: Add hash: fix coccinelle warnings net/netfilter/ipset/ip_set_hash_netnet.c:115:8-9: WARNING: return of 0/1 in function 'hash_netnet4_data_list' with return type bool /c/kernel-tests/src/cocci/net/netfilter/ipset/ip_set_hash_netnet.c:338:8-9: WARNING: return of 0/1 in function 'hash_netnet6_data_list' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-03-06 09:31:42 +01:00
Sergey Popovich	35f6e63abe	netfilter: ipset: Follow manual page behavior for SET target on list:set ipset(8) for list:set says: The match will try to find a matching entry in the sets and the target will try to add an entry to the first set to which it can be added. However real behavior is bit differ from described. Consider example: # ipset create test-1-v4 hash:ip family inet # ipset create test-1-v6 hash:ip family inet6 # ipset create test-1 list:set # ipset add test-1 test-1-v4 # ipset add test-1 test-1-v6 # iptables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src # ip6tables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src And then when iptables/ip6tables rule matches packet IPSET target tries to add src from packet to the list:set test-1 where first entry is test-1-v4 and the second one is test-1-v6. For IPv4, as it first entry in test-1 src added to test-1-v4 correctly, but for IPv6 src not added! Placing test-1-v6 to the first element of list:set makes behavior correct for IPv6, but brokes for IPv4. This is due to result, returned from ip_set_add() and ip_set_del() from net/netfilter/ipset/ip_set_core.c when set in list:set equires more parameters than given or address families do not match (which is this case). It seems wrong returning 0 from ip_set_add() and ip_set_del() in this case, as 0 should be returned only when an element successfuly added/deleted to/from the set, contrary to ip_set_test() which returns 0 when no entry exists and >0 when entry found in set. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2014-03-06 09:31:41 +01:00
Pablo Neira Ayuso	0768b3b3d2	netfilter: nf_tables: add optional user data area to rules This allows us to store user comment strings, but it could be also used to store any kind of information that the user application needs to link to the rule. Scratch 8 bits for the new ulen field that indicates the length the user data area. 4 bits from the handle (so it's 42 bits long, according to Patrick, it would last 139 years with 1000 new rules per second) and 4 bits from dlen (so the expression data area is 4K, which seems sufficient by now even considering the compatibility layer). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>	2014-02-27 16:56:00 +01:00
Florian Westphal	39111fd261	netfilter: nfnetlink_log: remove unused code Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-25 11:30:01 +01:00
Patrick McHardy	e0abdadcc6	netfilter: nf_tables: accept QUEUE/DROP verdict parameters Allow userspace to specify the queue number or the errno code for QUEUE and DROP verdicts. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-25 11:29:26 +01:00
Patrick McHardy	67a8fc27cc	netfilter: nf_tables: add nft_dereference() macro Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-25 11:29:23 +01:00
Patrick McHardy	0eb5db7ad3	netfilter: nfnetlink: add rcu_dereference_protected() helpers Add a lockdep_nfnl_is_held() function and a nfnl_dereference() macro for RCU dereferences protected by a NFNL subsystem mutex. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-25 11:29:21 +01:00
Patrick McHardy	3e90ebd3c9	netfilter: ip_set: rename nfnl_dereference()/nfnl_set() The next patch will introduce a nfnl_dereference() macro that actually checks that the appropriate mutex is held and therefore needs a subsystem argument. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-25 11:29:18 +01:00
Florian Westphal	d2bf2f34cc	netfilter: nft_ct: labels get support This also adds NF_CT_LABELS_MAX_SIZE so it can be re-used as BUILD_BUG_ON in nft_ct. At this time, nft doesn't yet support writing to the label area; when this changes the label->words handling needs to be moved out of xt_connlabel.c into nf_conntrack_labels.c. Also removes a useless run-time check: words cannot grow beyond 4 (32 bit) or 2 (64bit) since xt_connlabel enforces a maximum of 128 labels. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-19 11:41:25 +01:00
Pablo Neira Ayuso	2ba436fc02	netfilter: xt_ipcomp: Use ntohs to ease sparse warning 0-DAY kernel build testing backend reported: sparse warnings: (new ones prefixed by >>) >> >> net/netfilter/xt_ipcomp.c:63:26: sparse: restricted __be16 degrades to integer >> >> net/netfilter/xt_ipcomp.c:63:26: sparse: cast to restricted __be32 Fix this by using ntohs without shifting. Tested with: make C=1 CF=-D__CHECK_ENDIAN__ Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-19 11:41:25 +01:00
Pablo Neira Ayuso	0eba801b64	netfilter: ctnetlink: force null nat binding on insert Quoting Andrey Vagin: When a conntrack is created by kernel, it is initialized (sets IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it is added in hashes (__nf_conntrack_hash_insert), so one conntract can't be initialized from a few threads concurrently. ctnetlink can add an uninitialized conntrack (w/o IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up this conntrack and start initialize it concurrently. It's dangerous, because BUG can be triggered from nf_nat_setup_info. Fix this race by always setting up nat, even if no CTA_NAT_ attribute was requested before inserting the ct into the hash table. In absence of CTA_NAT_ attribute, a null binding is created. This alters current behaviour: Before this patch, the first packet matching the newly injected conntrack would be run through the nat table since nf_nat_initialized() returns false. IOW, this forces ctnetlink users to specify the desired nat transformation on ct creation time. Thanks for Florian Westphal, this patch is based on his original patch to address this problem, including this patch description. Reported-By: Andrey Vagin <avagin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2014-02-18 00:13:51 +01:00
Nikolay Aleksandrov	f627ed91d8	netfilter: nf_tables: check if payload length is a power of 2 Add a check if payload's length is a power of 2 when selecting ops. The fast ops were meant for well aligned loads, also this fixes a small bug when using a length of 3 with some offsets which causes only 1 byte to be loaded because the fast ops are chosen. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-17 11:21:17 +01:00
Paul Bolle	06efbd6d56	netfilter: nft_meta: fix typo "CONFIG_NET_CLS_ROUTE" There are two checks for CONFIG_NET_CLS_ROUTE, but the corresponding Kconfig symbol was dropped in v2.6.39. Since the code guards access to dst_entry.tclassid it seems CONFIG_IP_ROUTE_CLASSID should be used instead. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-14 11:37:34 +01:00
Patrick McHardy	ce898ecb5a	netfilter: nft_reject_inet: fix unintended fall-through in switch-statatement For IPv4 packets, we call both IPv4 and IPv6 reject. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-14 11:37:33 +01:00
Rashika Kheria	bd76ed36ba	net: Include appropriate header file in netfilter/nft_lookup.c Include appropriate header file net/netfilter/nf_tables_core.h in net/netfilter/nft_lookup.c because it has prototype declaration of functions defined in net/netfilter/nft_lookup.c. This eliminates the following warning in net/netfilter/nft_lookup.c: net/netfilter/nft_lookup.c:133:12: warning: no previous prototype for ‘nft_lookup_module_init’ [-Wmissing-prototypes] net/netfilter/nft_lookup.c:138:6: warning: no previous prototype for ‘nft_lookup_module_exit’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-09 17:32:50 -08:00
Patrick McHardy	6d8c00d58e	netfilter: nf_tables: unininline nft_trace_packet() It makes no sense to inline a rarely used function meant for debugging only that is called a total of five times in the main evaluation loop. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-07 17:50:27 +01:00
Pablo Neira Ayuso	62f9c8b40d	netfilter: nf_tables: fix loop checking with end interval elements Fix access to uninitialized data for end interval elements. The element data part is uninitialized in interval end elements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-07 17:21:45 +01:00
Pablo Neira Ayuso	2fb91ddbf8	netfilter: nft_rbtree: fix data handling of end interval elements This patch fixes several things which related to the handling of end interval elements: * Chain use underflow with intervals and map: If you add a rule using intervals+map that introduces a loop, the error path of the rbtree set decrements the chain refcount for each side of the interval, leading to a chain use counter underflow. * Don't copy the data part of the end interval element since, this area is uninitialized and this confuses the loop detection code. * Don't allocate room for the data part of end interval elements since this is unused. So, after this patch the idea is that end interval elements don't have a data part. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>	2014-02-07 14:22:06 +01:00
Pablo Neira Ayuso	bd7fc645da	netfilter: nf_tables: do not allow NFT_SET_ELEM_INTERVAL_END flag and data This combination is not allowed since end interval elements cannot contain data. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>	2014-02-07 14:21:49 +01:00
Pablo Neira Ayuso	0165d9325d	netfilter: nf_tables: fix racy rule deletion We may lost race if we flush the rule-set (which happens asynchronously via call_rcu) and we try to remove the table (that userspace assumes to be empty). Fix this by recovering synchronous rule and chain deletion. This was introduced time ago before we had no batch support, and synchronous rule deletion performance was not good. Now that we have the batch support, we can just postpone the purge of old rule in a second step in the commit phase. All object deletions are synchronous after this patch. As a side effect, we save memory as we don't need rcu_head per rule anymore. Cc: Patrick McHardy <kaber@trash.net> Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 11:46:06 +01:00
Patrick McHardy	b8ecbee67c	netfilter: nf_tables: fix log/queue expressions for NFPROTO_INET The log and queue expressions both store the family during ->init() and use it to deliver packets. This is wrong when used in NFPROTO_INET since they should both deliver to the actual AF of the packet, not the dummy NFPROTO_INET. Use the family from the hook ops to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 11:41:38 +01:00
Patrick McHardy	05513e9e33	netfilter: nf_tables: add reject module for NFPROTO_INET Add a reject module for NFPROTO_INET. It does nothing but dispatch to the AF-specific modules based on the hook family. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 09:44:18 +01:00
Patrick McHardy	cc4723ca31	netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc parts Currently the nft_reject module depends on symbols from ipv6. This is wrong since no generic module should force IPv6 support to be loaded. Split up the module into AF-specific and a generic part. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 09:44:10 +01:00
Patrick McHardy	64d46806b6	netfilter: nf_tables: add AF specific expression support For the reject module, we need to add AF-specific implementations to get rid of incorrect module dependencies. Try to load an AF-specific module first and fall back to generic modules. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 00:05:36 +01:00
Patrick McHardy	51292c0735	netfilter: nft_ct: fix missing NFT_CT_L3PROTOCOL key in validity checks The key was missing in the list of valid keys, add it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 00:05:33 +01:00
Patrick McHardy	ec2c993568	netfilter: nf_tables: fix potential oops when dumping sets Commit `c9c8e48597` (netfilter: nf_tables: dump sets in all existing families) changed nft_ctx_init_from_setattr() to only look up the address family if it is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute is given, nftables_afinfo_lookup() will dereference the NULL afi pointer. Fix by checking for non-NULL afi and also move a check added by that commit to the proper position. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-06 00:04:15 +01:00
Patrick McHardy	53b70287dd	netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name() The map that is used to allocate anonymous sets is indeed BITS_PER_BYTE * PAGE_SIZE long. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 17:46:07 +01:00
Pablo Neira Ayuso	e53376bef2	netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt With this patch, the conntrack refcount is initially set to zero and it is bumped once it is added to any of the list, so we fulfill Eric's golden rule which is that all released objects always have a refcount that equals zero. Andrey Vagin reports that nf_conntrack_free can't be called for a conntrack with non-zero ref-counter, because it can race with nf_conntrack_find_get(). A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero ref-counter says that this conntrack is used. So when we release a conntrack with non-zero counter, we break this assumption. CPU1 CPU2 ____nf_conntrack_find() nf_ct_put() destroy_conntrack() ... init_conntrack __nf_conntrack_alloc (set use = 1) atomic_inc_not_zero(&ct->use) (use = 2) if (!l4proto->new(ct, skb, dataoff, timeouts)) nf_conntrack_free(ct); (use = 2 !!!) ... __nf_conntrack_alloc (set use = 1) if (!nf_ct_key_equal(h, tuple, zone)) nf_ct_put(ct); (use = 0) destroy_conntrack() /* continue to work with CT */ After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get" another bug was triggered in destroy_conntrack(): <4>[67096.759334] ------------[ cut here ]------------ <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211! ... <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack] <4>[67096.760255] Call Trace: <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30 <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack] <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack] <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4] <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110 <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20 <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910 <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0 <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140 <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140 <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60 <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20 <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40 <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20 <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190 <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200 <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290 <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210 <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 I have reused the original title for the RFC patch that Andrey posted and most of the original patch description. Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Vagin <avagin@parallels.com> Cc: Florian Westphal <fw@strlen.de> Reported-by: Andrew Vagin <avagin@parallels.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-02-05 17:46:06 +01:00
Andrey Vagin	c6825c0976	netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Lets look at destroy_conntrack: hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); ... nf_conntrack_free(ct) kmem_cache_free(net->ct.nf_conntrack_cachep, ct); net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU. The hash is protected by rcu, so readers look up conntracks without locks. A conntrack is removed from the hash, but in this moment a few readers still can use the conntrack. Then this conntrack is released and another thread creates conntrack with the same address and the equal tuple. After this a reader starts to validate the conntrack: * It's not dying, because a new conntrack was created * nf_ct_tuple_equal() returns true. But this conntrack is not initialized yet, so it can not be used by two threads concurrently. In this case BUG_ON may be triggered from nf_nat_setup_info(). Florian Westphal suggested to check the confirm bit too. I think it's right. task 1 task 2 task 3 nf_conntrack_find_get ____nf_conntrack_find destroy_conntrack hlist_nulls_del_rcu nf_conntrack_free kmem_cache_free __nf_conntrack_alloc kmem_cache_alloc memset(&ct->tuplehash[IP_CT_DIR_MAX], if (nf_ct_is_dying(ct)) if (!nf_ct_tuple_equal() I'm not sure, that I have ever seen this race condition in a real life. Currently we are investigating a bug, which is reproduced on a few nodes. In our case one conntrack is initialized from a few tasks concurrently, we don't have any other explanation for this. <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322! ... <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>] [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat] ... <4>[46267.085549] Call Trace: <4>[46267.085622] [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat] <4>[46267.085697] [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat] <4>[46267.085770] [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat] <4>[46267.085843] [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat] <4>[46267.085919] [<ffffffff814841b9>] nf_iterate+0x69/0xb0 <4>[46267.085991] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086063] [<ffffffff81484374>] nf_hook_slow+0x74/0x110 <4>[46267.086133] [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0 <4>[46267.086207] [<ffffffff814b5890>] ? dst_output+0x0/0x20 <4>[46267.086277] [<ffffffff81495204>] ip_output+0xa4/0xc0 <4>[46267.086346] [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910 <4>[46267.086419] [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0 <4>[46267.086491] [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50 <4>[46267.086562] [<ffffffff81444d67>] sock_sendmsg+0x117/0x140 <4>[46267.086638] [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20 <4>[46267.086712] [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40 <4>[46267.086785] [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80 <4>[46267.086858] [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20 <4>[46267.086936] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087006] [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90 <4>[46267.087081] [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0 <4>[46267.087151] [<ffffffff81445599>] sys_sendto+0x139/0x190 <4>[46267.087229] [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0 <4>[46267.087303] [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200 <4>[46267.087378] [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290 <4>[46267.087454] [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210 <4>[46267.087531] [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210 <4>[46267.087607] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5 <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74 <1>[46267.088023] RIP [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 13:16:18 +01:00
Patrick McHardy	3dd7279fb6	netfilter: nf_tables: fix oops when deleting a chain with references The following commands trigger an oops: # nft -i nft> add table filter nft> add chain filter input { type filter hook input priority 0; } nft> add chain filter test nft> add rule filter input jump test nft> delete chain filter test We need to check the chain use counter before allowing destruction since we might have references from sets or jump rules. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341 Reported-by: Matthew Ife <deleriux1@gmail.com> Tested-by: Matthew Ife <deleriux1@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 13:16:17 +01:00
Arturo Borrero	2a53bfb3e0	netfilter: nft_ct: fix unconditional dump of 'dir' attr We want to make sure that the information that we get from the kernel can be reinjected without troubles. The kernel shouldn't return an attribute that is not required, or even prohibited. Dumping unconditionally NFTA_CT_DIRECTION could lead an application in userspace to interpret that the attribute was originally set, while it was not. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-02-05 13:16:17 +01:00
Michal Kubecek	2a971354e7	ipvs: fix AF assignment in ip_vs_conn_new() If a fwmark is passed to ip_vs_conn_new(), it is passed in vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we may copy only first 4 bytes of an IPv6 address into cp->daddr. Signed-off-by: Bogdano Arendartchuk <barendartchuk@suse.com> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2014-02-04 21:13:47 +09:00
Linus Torvalds	4ba9920e5e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) BPF debugger and asm tool by Daniel Borkmann. 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann. 3) Correct reciprocal_divide and update users, from Hannes Frederic Sowa and Daniel Borkmann. 4) Currently we only have a "set" operation for the hw timestamp socket ioctl, add a "get" operation to match. From Ben Hutchings. 5) Add better trace events for debugging driver datapath problems, also from Ben Hutchings. 6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we have a small send and a previous packet is already in the qdisc or device queue, defer until TX completion or we get more data. 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko. 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel Borkmann. 9) Share IP header compression code between Bluetooth and IEEE802154 layers, from Jukka Rissanen. 10) Fix ipv6 router reachability probing, from Jiri Benc. 11) Allow packets to be captured on macvtap devices, from Vlad Yasevich. 12) Support tunneling in GRO layer, from Jerry Chu. 13) Allow bonding to be configured fully using netlink, from Scott Feldman. 14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can already get the TCI. From Atzm Watanabe. 15) New "Heavy Hitter" qdisc, from Terry Lam. 16) Significantly improve the IPSEC support in pktgen, from Fan Du. 17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom Herbert. 18) Add Proportional Integral Enhanced packet scheduler, from Vijay Subramanian. 19) Allow openvswitch to mmap'd netlink, from Thomas Graf. 20) Key TCP metrics blobs also by source address, not just destination address. From Christoph Paasch. 21) Support 10G in generic phylib. From Andy Fleming. 22) Try to short-circuit GRO flow compares using device provided RX hash, if provided. From Tom Herbert. The wireless and netfilter folks have been busy little bees too. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits) net/cxgb4: Fix referencing freed adapter ipv6: reallocate addrconf router for ipv6 address when lo device up fib_frontend: fix possible NULL pointer dereference rtnetlink: remove IFLA_BOND_SLAVE definition rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info qlcnic: update version to 5.3.55 qlcnic: Enhance logic to calculate msix vectors. qlcnic: Refactor interrupt coalescing code for all adapters. qlcnic: Update poll controller code path qlcnic: Interrupt code cleanup qlcnic: Enhance Tx timeout debugging. qlcnic: Use bool for rx_mac_learn. bonding: fix u64 division rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC sfc: Use the correct maximum TX DMA ring size for SFC9100 Add Shradha Shah as the sfc driver maintainer. net/vxlan: Share RX skb de-marking and checksum checks with ovs tulip: cleanup by using ARRAY_SIZE() ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called net/cxgb4: Don't retrieve stats during recovery ...	2014-01-25 11:17:34 -08:00
Cody P Schafer	b182837ac1	net/netfilter/ipset/ip_set_hash_netiface.c: use rbtree postorder iteration instead of opencoding Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-23 16:37:03 -08:00
Linus Torvalds	bb1281f2aa	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science stuff from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) neighbour.h: fix comment sched: Fix warning on make htmldocs caused by wait.h slab: struct kmem_cache is protected by slab_mutex doc: Fix typo in USB Gadget Documentation of/Kconfig: Spelling s/one/once/ mkregtable: Fix sscanf handling lp5523, lp8501: comment improvements thermal: rcar: comment spelling treewide: fix comments and printk msgs IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart() Documentation: update /proc/uptime field description Documentation: Fix size parameter for snprintf arm: fix comment header and macro name asm-generic: uaccess: Spelling s/a ny/any/ mtd: onenand: fix comment header doc: driver-model/platform.txt: fix a typo drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text doc: Fix typo (acces_process_vm -> access_process_vm) treewide: Fix typos in printk drivers/gpu/drm/qxl/Kconfig: reformat the help text ...	2014-01-22 21:21:55 -08:00
David S. Miller	5ff1dd2416	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== This small batch contains several Netfilter fixes for your net-next tree, more specifically: * Fix compilation warning in nft_ct in NF_CONNTRACK_MARK is not set, from Kristian Evensen. * Add dependency to IPV6 for NF_TABLES_INET. This one has been reported by the several robots that are testing .config combinations, from Paul Gortmaker. * Fix default base chain policy setting in nf_tables, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 11:44:54 -08:00
Kristian Evensen	847c8e2959	netfilter: nft_ct: fix compilation warning if NF_CONNTRACK_MARK is not set net/netfilter/nft_ct.c: In function 'nft_ct_set_eval': net/netfilter/nft_ct.c:136:6: warning: unused variable 'value' [-Wunused-variable] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-15 11:00:14 +01:00
Aruna-Hewapathirane	63862b5bef	net: replace macros net_random and net_srandom with direct calls to prandom This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 15:15:25 -08:00
David S. Miller	0a379e21c5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-01-14 14:42:42 -08:00
Paul Gortmaker	419331d8ff	netfilter: Add dependency on IPV6 for NF_TABLES_INET Commit `1d49144c0a` ("netfilter: nf_tables: add "inet" table for IPv4/IPv6") allows creation of non-IPV6 enabled .config files that will fail to configure/link as follows: warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES) warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES) warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES) net/built-in.o: In function `nft_reject_eval': nft_reject.c:(.text+0x651e8): undefined reference to `nf_ip6_checksum' nft_reject.c:(.text+0x65270): undefined reference to `ip6_route_output' nft_reject.c:(.text+0x656c4): undefined reference to `ip6_dst_hoplimit' make: *** [vmlinux] Error 1 Since the feature is to allow for a mixed IPV4 and IPV6 table, it seems sensible to make it depend on IPV6. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-14 16:33:00 +01:00
David S. Miller	ef8570d859	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== This batch contains one single patch with the l2tp match for xtables, from James Chapman. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-10 14:50:02 -05:00
Pablo Neira Ayuso	8f46df184c	netfilter: nf_tables: fix missing byteorder conversion in policy When fetching the policy attribute, the byteorder conversion was missing, breaking the chain policy setting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-10 18:26:13 +01:00
David S. Miller	751fcac19a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== nf_tables updates for net-next The following patchset contains the following nf_tables updates, mostly updates from Patrick McHardy, they are: * Add the "inet" table and filter chain type for this new netfilter family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6 rules, this should help to simplify the burden in the administration of dual stack firewalls. This also includes several patches to prepare the infrastructure for this new table and a new meta extension to match the layer 3 and 4 protocol numbers, from Patrick McHardy. * Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used in NFPROTO_INET, as we don't certainly know which one would be used, also from Patrick McHardy. * Do not allow to delete a table that contains sets, otherwise these sets become orphan, from Patrick McHardy. * Hold a reference to the corresponding nf_tables family module when creating a table of that family type, to avoid the module deletion when in use, from Patrick McHardy. * Update chain counters before setting the chain policy to ensure that we don't leave the chain in inconsistent state in case of errors (aka. restore chain atomicity). This also fixes a possible leak if it fails to allocate the chain counters if no counters are passed to be restored, from Patrick McHardy. * Don't check for overflows in the table counter if we are just renaming a chain, from Patrick McHardy. * Replay the netlink request after dropping the nfnl lock to load the module that supports provides a chain type, from Patrick. * Fix chain type module references, from Patrick. * Several cleanups, function renames, constification and code refactorizations also from Patrick McHardy. * Add support to set the connmark, this can be used to set it based on the meta mark (similar feature to -j CONNMARK --restore), from Kristian Evensen. * A couple of fixes to the recently added meta/set support and nft_reject, and fix missing chain type unregistration if we fail to register our the family table/filter chain type, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-09 21:36:01 -05:00
Pablo Neira Ayuso	cf4dfa8539	netfilter: nf_tables: fix error path in the init functions We have to unregister chain type if this fails to register netns. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 23:25:48 +01:00
James Chapman	74f77a6b2b	netfilter: introduce l2tp match extension Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2 and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id and session-id, the filtering decision can also include the L2TP packet type (control or data), protocol version (2 or 3) and encapsulation type (UDP or IP). The most common use for this will likely be to filter L2TP data packets of individual L2TP tunnels or sessions. While a u32 match can be used, the L2TP protocol headers are such that field offsets differ depending on bits set in the header, making rules for matching generic L2TP connections cumbersome. This match extension takes care of all that. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 21:36:39 +01:00
Patrick McHardy	3876d22dba	netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain() We don't encode argument types into function names and since besides nft_do_chain() there are only AF-specific versions, there is no risk of confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	44a6f0df03	netfilter: nf_tables: prohibit deletion of a table with existing sets We currently leak the set memory when deleting a table that still has sets in it. Return EBUSY when attempting to delete a table with sets. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	7047f9d052	netfilter: nf_tables: take AF module reference when creating a table The table refers to data of the AF module, so we need to make sure the module isn't unloaded while the table exists. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	c5c1f975ad	netfilter: nf_tables: perform flags validation before table allocation Simplifies error handling. Additionally use the correct type u32 for the host byte order flags value. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	fa2c1de0bb	netfilter: nf_tables: minor nf_chain_type cleanups Minor nf_chain_type cleanups: - reorder struct to plug a hoe - rename struct module member to "owner" for consistency - rename nf_hookfn array to "hooks" for consistency - reorder initializers for better readability Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	2a37d755b8	netfilter: nf_tables: constify chain type definitions and pointers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00

... 3 4 5 6 7 ...

2905 Commits