linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-31 00:17:44 +00:00

Author	SHA1	Message	Date
David S. Miller	b2d6cee117	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The bpf syscall and selftests conflicts were trivial overlapping changes. The r8169 change involved moving the added mdelay from 'net' into a different function. A TLS close bug fix overlapped with the splitting of the TLS state into separate TX and RX parts. I just expanded the tests in the bug fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf == X". Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-11 20:53:22 -04:00
Mathias Krause	565f0fa902	xfrm: use a dedicated slab cache for struct xfrm_state struct xfrm_state is rather large (768 bytes here) and therefore wastes quite a lot of memory as it falls into the kmalloc-1024 slab cache, leaving 256 bytes of unused memory per XFRM state object -- a net waste of 25%. Using a dedicated slab cache for struct xfrm_state reduces the level of internal fragmentation to a minimum. On my configuration SLUB chooses to create a slab cache covering 4 pages holding 21 objects, resulting in an average memory waste of ~13 bytes per object -- a net waste of only 1.6%. In my tests this led to memory savings of roughly 2.3MB for 10k XFRM states. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-05-04 10:14:00 +02:00
Steffen Klassert	b48c05ab5d	xfrm: Fix warning in xfrm6_tunnel_net_exit. We need to make sure that all states are really deleted before we check that the state lists are empty. Otherwise we trigger a warning. Fixes: `baeb0dbbb5` ("xfrm6_tunnel: exit_net cleanup check added") Reported-and-tested-by:syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-04-16 07:50:09 +02:00
Steffen Klassert	19d7df69fd	xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems We don't have a compat layer for xfrm, so userspace and kernel structures have different sizes in this case. This results in a broken configuration, so refuse to configure socket policies when trying to insert from 32 bit userspace as we do it already with policies inserted via netlink. Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-02-02 09:23:23 +01:00
David S. Miller	955bd1d216	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-24 23:44:15 -05:00
Gustavo A. R. Silva	545d8ae7af	xfrm: fix boolean assignment in xfrm_get_type_offload Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Fixes: `ffdb5211da` ("xfrm: Auto-load xfrm offload modules") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-01-23 10:56:36 +01:00
Yossi Kuperman	cc01572e2f	xfrm: Add SA to hardware at the end of xfrm_state_construct() Current code configures the hardware with a new SA before the state has been fully initialized. During this time interval, an incoming ESP packet can cause a crash due to a NULL dereference. More specifically, xfrm_input() considers the packet as valid, and yet, anti-replay mechanism is not initialized. Move hardware configuration to the end of xfrm_state_construct(), and mark the state as valid once the SA is fully initialized. Fixes: `d77e38e612` ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Aviad Yehezkel <aviadye@mellnaox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2018-01-18 11:09:29 +01:00
David S. Miller	c02b3741eb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-01-17 00:10:42 -05:00
Sabrina Dubroca	2f10a61cee	xfrm: fix rcu usage in xfrm_get_type_offload request_module can sleep, thus we cannot hold rcu_read_lock() while calling it. The function also jumps back and takes rcu_read_lock() again (in xfrm_state_get_afinfo()), resulting in an imbalance. This codepath is triggered whenever a new offloaded state is created. Fixes: `ffdb5211da` ("xfrm: Auto-load xfrm offload modules") Reported-by: syzbot+ca425f44816d749e8eb49755567a75ee48cf4a30@syzkaller.appspotmail.com Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-12-31 16:29:24 +01:00
Herbert Xu	257a4b018d	xfrm: Forbid state updates from changing encap type Currently we allow state updates to competely replace the contents of x->encap. This is bad because on the user side ESP only sets up header lengths depending on encap_type once when the state is first created. This could result in the header lengths getting out of sync with the actual state configuration. In practice key managers will never do a state update to change the encapsulation type. Only the port numbers need to be changed as the peer NAT entry is updated. Therefore this patch adds a check in xfrm_state_update to forbid any changes to the encap_type. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-12-30 09:18:47 +01:00
David S. Miller	6bb8824732	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net net/ipv6/ip6_gre.c is a case of parallel adds. include/trace/events/tcp.h is a little bit more tricky. The removal of in-trace-macro ifdefs in 'net' paralleled with moving show_tcp_state_name and friends over to include/trace/events/sock.h in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-29 15:42:26 -05:00
David S. Miller	65bbbf6c20	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-12-22 1) Check for valid id proto in validate_tmpl(), otherwise we may trigger a warning in xfrm_state_fini(). From Cong Wang. 2) Fix a typo on XFRMA_OUTPUT_MARK policy attribute. From Michal Kubecek. 3) Verify the state is valid when encap_type < 0, otherwise we may crash on IPsec GRO . From Aviv Heller. 4) Fix stack-out-of-bounds read on socket policy lookup. We access the flowi of the wrong address family in the IPv4 mapped IPv6 case, fix this by catching address family missmatches before we do the lookup. 5) fix xfrm_do_migrate() with AEAD to copy the geniv field too. Otherwise the state is not fully initialized and migration fails. From Antony Antony. 6) Fix stack-out-of-bounds with misconfigured transport mode policies. Our policy template validation is not strict enough. It is possible to configure policies with transport mode template where the address family of the template does not match the selectors address family. Fix this by refusing such a configuration, address family can not change on transport mode. 7) Fix a policy reference leak when reusing pcpu xdst entry. From Florian Westphal. 8) Reinject transport-mode packets through tasklet, otherwise it is possible to reate a recursion loop. From Herbert Xu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-27 10:58:23 -05:00
Antony Antony	75bf50f4aa	xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM) copy geniv when cloning the xfrm state. x->geniv was not copied to the new state and migration would fail. xfrm_do_migrate .. xfrm_state_clone() .. .. esp_init_aead() crypto_alloc_aead() crypto_alloc_tfm() crypto_find_alg() return EAGAIN and failed Signed-off-by: Antony Antony <antony@phenome.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-12-08 07:39:30 +01:00
Lorenzo Colitti	be8f8284cd	net: xfrm: allow clearing socket xfrm policies. Currently it is possible to add or update socket policies, but not clear them. Therefore, once a socket policy has been applied, the socket cannot be used for unencrypted traffic. This patch allows (privileged) users to clear socket policies by passing in a NULL pointer and zero length argument to the {IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both the incoming and outgoing policies being cleared. The simple approach taken in this patch cannot clear socket policies in only one direction. If desired this could be added in the future, for example by continuing to pass in a length of zero (which currently is guaranteed to return EMSGSIZE) and making the policy be a pointer to an integer that contains one of the XFRM_POLICY_{IN,OUT} enum values. An alternative would have been to interpret the length as a signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output policy. Tested: https://android-review.googlesource.com/539816 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-11-30 10:53:06 +01:00
Kees Cook	e99e88a9d2	treewide: setup_timer() -> timer_setup() This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something ptr = (struct something )data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list t) { struct something ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something ptr = (struct something )data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list t) { struct something ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); \| -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); \| -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); \| -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); \| _E->_timer@_stl.function = _callback; \| _E->_timer@_stl.function = &_callback; \| _E->_timer@_stl.function = (_cast_func)_callback; \| _E->_timer@_stl.function = (_cast_func)&_callback; \| _E._timer@_stl.function = _callback; \| _E._timer@_stl.function = &_callback; \| _E._timer@_stl.function = (_cast_func)_callback; \| _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list t ) { ( ... when != _origarg _handletype _handle = -(_handletype )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle = -(void )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle; ... when != _handle _handle = -(_handletype )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg \| ... when != _origarg _handletype _handle; ... when != _handle _handle = -(void )_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list t ) { + _handletype _origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype )_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list t) { ... } // callback(struct something handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype _handle +struct timer_list t ) { + _handletype _handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list t) { - _handletype _handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); \| -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast \|\| change_callback_handle_cast_no_arg \|\| change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; \| _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; \| _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast \|\| change_callback_handle_cast_no_arg \|\| change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer \| -(_cast_data)&_E +&_E._timer \| -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); \| -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); \| -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>	2017-11-21 15:57:07 -08:00
Jonathan Basseri	2b06cdf3e6	xfrm: Clear sk_dst_cache when applying per-socket policy. If a socket has a valid dst cache, then xfrm_lookup_route will get skipped. However, the cache is not invalidated when applying policy to a socket (i.e. IPV6_XFRM_POLICY). The result is that new policies are sometimes ignored on those sockets. (Note: This was broken for IPv4 and IPv6 at different times.) This can be demonstrated like so, 1. Create UDP socket. 2. connect() the socket. 3. Apply an outbound XFRM policy to the socket. (setsockopt) 4. send() data on the socket. Packets will continue to be sent in the clear instead of matching an xfrm or returning a no-match error (EAGAIN). This affects calls to send() and not sendto(). Invalidating the sk_dst_cache is necessary to correctly apply xfrm policies. Since we do this in xfrm_user_policy(), the sk_lock was already acquired in either do_ip_setsockopt() or do_ipv6_setsockopt(), and we may call __sk_dst_reset(). Performance impact should be negligible, since this code is only called when changing xfrm policy, and only affects the socket in question. Fixes: `00bc0ef588` ("ipv6: Skip XFRM lookup if dst_entry in socket cache is valid") Tested: https://android-review.googlesource.com/517555 Tested: https://android-review.googlesource.com/418659 Signed-off-by: Jonathan Basseri <misterikkit@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-10-26 08:19:03 +02:00
Artem Savkov	dd269db849	xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock I might be wrong but it doesn't look like xfrm_state_lock is required for xfrm_policy_cache_flush and calling it under this lock triggers both "sleeping function called from invalid context" and "possible circular locking dependency detected" warnings on flush. Fixes: `ec30d78c14` xfrm: add xdst pcpu cache Signed-off-by: Artem Savkov <asavkov@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-09-28 09:39:05 +02:00
David S. Miller	6026e043d0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-01 17:42:05 -07:00
Koichiro Den	3f5a95ad6c	xfrm: fix null pointer dereference on state and tmpl sort Creating sub policy that matches the same outer flow as main policy does leads to a null pointer dereference if the outer mode's family is ipv4. For userspace compatibility, this patch just eliminates the crash i.e., does not introduce any new sorting rule, which would fruitlessly affect all but the aforementioned case. Signed-off-by: Koichiro Den <den@klaipeden.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-08-02 13:37:37 +02:00
Ilan Tayari	ffdb5211da	xfrm: Auto-load xfrm offload modules IPSec crypto offload depends on the protocol-specific offload module (such as esp_offload.ko). When the user installs an SA with crypto-offload, load the offload module automatically, in the same way that the protocol module is loaded (such as esp.ko) Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-08-02 11:00:15 +02:00
Florian Westphal	ec30d78c14	xfrm: add xdst pcpu cache retain last used xfrm_dst in a pcpu cache. On next request, reuse this dst if the policies are the same. The cache will not help with strict RR workloads as there is no hit. The cache packet-path part is reasonably small, the notifier part is needed so we do not add long hangs when a device is dismantled but some pcpu xdst still holds a reference, there are also calls to the flush operation when userspace deletes SAs so modules can be removed (there is no hit. We need to run the dst_release on the correct cpu to avoid races with packet path. This is done by adding a work_struct for each cpu and then doing the actual test/release on each affected cpu via schedule_work_on(). Test results using 4 network namespaces and null encryption: ns1 ns2 -> ns3 -> ns4 netperf -> xfrm/null enc -> xfrm/null dec -> netserver what TCP_STREAM UDP_STREAM UDP_RR Flow cache: 14644.61 294.35 327231.64 No flow cache: 14349.81 242.64 202301.72 Pcpu cache: 14629.70 292.21 205595.22 UDP tests used 64byte packets, tests ran for one minute each, value is average over ten iterations. 'Flow cache' is 'net-next', 'No flow cache' is net-next plus this series but without this patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-18 11:13:41 -07:00
Reshetova, Elena	88755e9c7c	net, xfrm: convert xfrm_state.refcnt from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-04 22:35:18 +01:00
David S. Miller	93bbbfbb4a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-06-23 1) Use memdup_user to spmlify xfrm_user_policy. From Geliang Tang. 2) Make xfrm_dev_register static to silence a sparse warning. From Wei Yongjun. 3) Use crypto_memneq to check the ICV in the AH protocol. From Sabrina Dubroca. 4) Remove some unused variables in esp6. From Stephen Hemminger. 5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port. From Antony Antony. 6) Include the UDP encapsulation port to km_migrate announcements. From Antony Antony. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:17:31 -04:00
Antony Antony	8bafd73093	xfrm: add UDP encapsulation port in migrate message Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement to userland. Only add if XFRMA_ENCAP was in user migrate request. Signed-off-by: Antony Antony <antony@phenome.org> Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-06-07 08:35:54 +02:00
Antony Antony	4ab47d47af	xfrm: extend MIGRATE with UDP encapsulation port Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional netlink attribute XFRMA_ENCAP. The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8) could go to sleep for a few minutes and wake up. When it wake up the NAT mapping could have expired, the device send a MOBIKE UPDATE_SA message to migrate the IPsec SA. The change could be a change UDP encapsulation port, IP address, or both. Reported-by: Paul Wouters <pwouters@redhat.com> Signed-off-by: Antony Antony <antony@phenome.org> Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-06-07 08:25:58 +02:00
Antony Antony	a486cd2366	xfrm: fix state migration copy replay sequence numbers During xfrm migration copy replay and preplay sequence numbers from the previous state. Here is a tcpdump output showing the problem. 10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder. After the migration it sent wrong sequence number, reset to 1. The migration is from 10.0.0.52 to 10.0.0.53. IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136 IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136 IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136 IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136 IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136 NOTE: next sequence is wrong 0x1 IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136 IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136 IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136 Signed-off-by: Antony Antony <antony@phenome.org> Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-05-19 12:49:13 +02:00
Geliang Tang	a133d93054	xfrm: use memdup_user Use memdup_user() helper instead of open-coding to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-05-16 07:32:25 +02:00
Steffen Klassert	d77e38e612	xfrm: Add an IPsec hardware offloading API This patch adds all the bits that are needed to do IPsec hardware offload for IPsec states and ESP packets. We add xfrmdev_ops to the net_device. xfrmdev_ops has function pointers that are needed to manage the xfrm states in the hardware and to do a per packet offloading decision. Joint work with: Ilan Tayari <ilant@mellanox.com> Guy Shapiro <guysh@mellanox.com> Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-04-14 10:06:10 +02:00
Steffen Klassert	9d389d7f84	xfrm: Add a xfrm type offload. We add a struct xfrm_type_offload so that we have the offloaded codepath separated to the non offloaded codepath. With this the non offloade and the offloaded codepath can coexist. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-04-14 10:05:44 +02:00
Florian Westphal	3819a35fdb	xfrm: fix possible null deref in xfrm_init_tempstate Dan reports following smatch warning: net/xfrm/xfrm_state.c:659 error: we previously assumed 'afinfo' could be null (see line 651) 649 struct xfrm_state_afinfo *afinfo = xfrm_state_afinfo_get_rcu(family); 651 if (afinfo) ... 658 } 659 afinfo->init_temprop(x, tmpl, daddr, saddr); I am resonably sure afinfo cannot be NULL here. xfrm_state4.c and state6.c are both part of ipv4/ipv6 (depends on CONFIG_XFRM, a boolean) but even if ipv6 is a module state6.c can't be removed (ipv6 lacks module_exit so it cannot be removed). The only callers for xfrm6_fini that leads to state backend unregister are error unwinding paths that can be called during ipv6 init function. So after ipv6 module is loaded successfully the state backend cannot go away anymore. The family value from policy lookup path is taken from dst_entry, so that should always be AF_INET(6). However, since this silences the warning and avoids readers of this code wondering about possible null deref it seems preferrable to be defensive and just add the old check back. Fixes: `711059b975` ("xfrm: add and use xfrm_state_afinfo_get_rcu") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-16 08:36:33 +01:00
Florian Westphal	75cda62d9c	xfrm: state: simplify rcu_read_unlock handling in two spots Instead of: if (foo) { unlock(); return bar(); } unlock(); do: unlock(); if (foo) return bar(); This is ok because rcu protected structure is only dereferenced before the conditional. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-10 10:57:14 +01:00
Florian Westphal	711059b975	xfrm: add and use xfrm_state_afinfo_get_rcu xfrm_init_tempstate is always called from within rcu read side section. We can thus use a simpler function that doesn't call rcu_read_lock again. While at it, also make xfrm_init_tempstate return value void, the return value was never tested. A followup patch will replace remaining callers of xfrm_state_get_afinfo with xfrm_state_afinfo_get_rcu variant and then remove the 'old' get_afinfo interface. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-10 10:57:13 +01:00
Florian Westphal	af5d27c4e1	xfrm: remove xfrm_state_put_afinfo commit `44abdc3047` ("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made xfrm_state_put_afinfo equivalent to rcu_read_unlock. Use spatch to replace it with direct calls to rcu_read_unlock: @@ struct xfrm_state_afinfo *a; @@ - xfrm_state_put_afinfo(a); + rcu_read_unlock(); old: text data bss dec hex filename 22570 72 424 23066 5a1a xfrm_state.o 1612 0 0 1612 64c xfrm_output.o new: 22554 72 424 23050 5a0a xfrm_state.o 1596 0 0 1596 63c xfrm_output.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-10 10:57:13 +01:00
Florian Westphal	423826a7b1	xfrm: avoid rcu sparse warning xfrm/xfrm_state.c:1973:21: error: incompatible types in comparison expression (different address spaces) Harmless, but lets fix it to reduce the noise. While at it, get rid of unneeded NULL check, its never hit: net/ipv4/xfrm4_state.c: xfrm_state_register_afinfo(&xfrm4_state_afinfo); net/ipv6/xfrm6_state.c: return xfrm_state_register_afinfo(&xfrm6_state_afinfo); net/ipv6/xfrm6_state.c: xfrm_state_unregister_afinfo(&xfrm6_state_afinfo); ... are the only callsites. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-10 10:57:13 +01:00
Florian Westphal	b3b73b8e6d	xfrm: state: do not acquire lock in get_mtu helpers Once flow cache gets removed the mtu initialisation happens for every skb that gets an xfrm attached, so this lock starts to show up in perf. It is not obvious why this lock is required -- the caller holds reference on the state struct, type->destructor is only called from the state gc worker (all state structs on gc list must have refcount 0). xfrm_init_state already has been called (else private data accessed by type->get_mtu() would not be set up). So just remove the lock -- the race on the state (DEAD?) doesn't matter (could change right after dropping the lock too). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-06 08:44:56 +01:00
Alexander Alemayhu	1365e547c6	xfrm: trivial typos o s/descentant/descendant o s/workarbound/workaround Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-01-04 06:49:28 +01:00
Thomas Gleixner	8b0e195314	ktime: Cleanup ktime_set() usage ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>	2016-12-25 17:21:22 +01:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Florian Westphal	2258d927a6	xfrm: remove unused helper Not used anymore since 2009 (`9e0d57fd6d`, 'xfrm: SAD entries do not expire correctly after suspend-resume'). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-09-30 08:20:56 +02:00
David S. Miller	1678c1134f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2016-09-23 Only two patches this time: 1) Fix a comment reference to struct xfrm_replay_state_esn. From Richard Guy Briggs. 2) Convert xfrm_state_lookup to rcu, we don't need the xfrm_state_lock anymore in the input path. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-24 08:18:19 -04:00
David S. Miller	d6989d4bbe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-09-23 06:46:57 -04:00
Florian Westphal	c2f672fc94	xfrm: state lookup can be lockless This is called from the packet input path, we get lock contention if many cpus handle ipsec in parallel. After recent rcu conversion it is safe to call __xfrm_state_lookup without the spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-09-21 12:37:29 +02:00
Ilan Tayari	b588479358	xfrm: Fix memory leak of aead algorithm name commit `1a6509d991` ("[IPSEC]: Add support for combined mode algorithms") introduced aead. The function attach_aead kmemdup()s the algorithm name during xfrm_state_construct(). However this memory is never freed. Implementation has since been slightly modified in commit `ee5c23176f` ("xfrm: Clone states properly on migration") without resolving this leak. This patch adds a kfree() call for the aead algorithm name. Fixes: `1a6509d991` ("[IPSEC]: Add support for combined mode algorithms") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-09-19 12:08:58 +02:00
Florian Westphal	35db57bbc4	xfrm: state: remove per-netns gc task After commit `5b8ef3415a` ("xfrm: Remove ancient sleeping when the SA is in acquire state") gc does not need any per-netns data anymore. As far as gc is concerned all state structs are the same, so we can use a global work struct for it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-24 13:16:06 +02:00
Florian Westphal	d737a58055	xfrm: state: don't use lock anymore unless acquire operation is needed push the lock down, after earlier patches we can rely on rcu to make sure state struct won't go away. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:24 +02:00
Florian Westphal	c8406998b8	xfrm: state: use rcu_deref and assign_pointer helpers Before xfrm_state_find() can use rcu_read_lock instead of xfrm_state_lock we need to switch users of the hash table to assign/obtain the pointers with the appropriate rcu helpers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:24 +02:00
Florian Westphal	b65e3d7be0	xfrm: state: add sequence count to detect hash resizes Once xfrm_state_find is lockless we have to cope with a concurrent resize opertion. We use a sequence counter to block in case a resize is in progress and to detect if we might have missed a state that got moved to a new hash table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:24 +02:00
Florian Westphal	df7274eb70	xfrm: state: delay freeing until rcu grace period has elapsed The hash table backend memory and the state structs are free'd via kfree/vfree. Once we only rely on rcu during lookups we have to make sure no other cpu is currently accessing this before doing the free. Free operations already happen from worker so we can use synchronize_rcu to wait until concurrent readers are done. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:23 +02:00
Florian Westphal	02efdff7e2	xfrm: state: use atomic_inc_not_zero to increment refcount Once xfrm_state_lookup_byaddr no longer acquires the state lock another cpu might be freeing the state entry at the same time. To detect this we use atomic_inc_not_zero, we then signal -EAGAIN to caller in case our result was stale. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:23 +02:00
Florian Westphal	ae3fb6d321	xfrm: state: use hlist_for_each_entry_rcu helper This is required once we allow lockless access of bydst/bysrc hash tables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2016-08-10 11:23:23 +02:00

1 2 3 4 5 ...

284 commits