Commit graph

1031 commits

Author SHA1 Message Date
Florian Westphal
aff669bc28 xfrm_policy: kill flow to policy dir conversion
XFRM_POLICY_IN/OUT/FWD are identical to FLOW_DIR_*, so gcc already
removed this function as its just returns the argument.  Again, no
code change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal
855dad99c0 xfrm_policy: remove always true/false branches
after previous change oldflo and xdst are always NULL.
These branches were already removed by gcc, this doesn't change code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal
3ca28286ea xfrm_policy: bypass flow_cache_lookup
Instead of consulting flow cache, call the xfrm bundle/policy lookup
functions directly.  This pretends the flow cache had no entry.

This helps to gradually remove flow cache integration,
followup commit will remove the dead code that this change adds.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Reshetova, Elena
55eabed60a net, xfrm: convert sec_path.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 22:35:18 +01:00
Reshetova, Elena
850a6212c6 net, xfrm: convert xfrm_policy.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 22:35:18 +01:00
Reshetova, Elena
88755e9c7c net, xfrm: convert xfrm_state.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04 22:35:18 +01:00
David S. Miller
b079115937 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30 12:43:08 -04:00
David S. Miller
93bbbfbb4a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-06-23

1) Use memdup_user to spmlify xfrm_user_policy.
   From Geliang Tang.

2) Make xfrm_dev_register static to silence a sparse warning.
   From Wei Yongjun.

3) Use crypto_memneq to check the ICV in the AH protocol.
   From Sabrina Dubroca.

4) Remove some unused variables in esp6.
   From Stephen Hemminger.

5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
   From Antony Antony.

6) Include the UDP encapsulation port to km_migrate announcements.
   From Antony Antony.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23 14:17:31 -04:00
Wei Wang
a4c2fd7f78 net: remove DST_NOCACHE flag
DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
b2a9c0ed75 net: remove DST_NOGC flag
Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
52df157f17 xfrm: take refcnt of dst when creating struct xfrm_dst bundle
During the creation of xfrm_dst bundle, always take ref count when
allocating the dst. This way, xfrm_bundle_create() will form a linked
list of dst with dst->child pointing to a ref counted dst child. And
the returned dst pointer is also ref counted. This makes the link from
the flow cache to this dst now ref counted properly.
As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() and its related
function calls should be replaced with dst_release() or
dst_release_immediate().

The special handling logic for dst->child in dst_destroy() can be
replaced with a simple dst_release_immediate() call on the child to
release the whole list linked by dst->child pointer.
Previously used DST_NOHASH flag is not needed anymore as well. The
reason that DST_NOHASH is used in the existing code is mainly to prevent
the dst inserted in the fib tree to be wrongly destroyed during the
deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH
flag is marked in all the dst children except the one which is in the
fib tree.
However, with this patch series to remove dst gc logic and release dst
only based on ref count, it is safe to release all the children from a
xfrm_dst bundle as long as the dst children are all ref counted
properly which is already the case in the existing code.
So, this patch removes the use of DST_NOHASH flag.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Hangbin Liu
138437f591 xfrm: move xfrm_garbage_collect out of xfrm_policy_flush
Now we will force to do garbage collection if any policy removed in
xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
-> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
dereference when check percpu_empty. The code path looks like:

flow_cache_fini()
  - fc->percpu = NULL
xfrm_policy_fini()
  - xfrm_policy_flush()
    - xfrm_garbage_collect()
      - flow_cache_flush()
        - flow_cache_percpu_empty()
	  - fcp = per_cpu_ptr(fc->percpu, cpu)

To reproduce, just add ipsec in netns and then remove the netns.

v2:
As Xin Long suggested, since only two other places need to call it. move
xfrm_garbage_collect() outside xfrm_policy_flush().

v3:
Fix subject mismatch after v2 fix.

Fixes: 35db069121 ("xfrm: do the garbage collection after flushing policy")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-12 11:51:21 +02:00
Antony Antony
8bafd73093 xfrm: add UDP encapsulation port in migrate message
Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
to userland. Only add if XFRMA_ENCAP was in user migrate request.

Signed-off-by: Antony Antony <antony@phenome.org>
Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-07 08:35:54 +02:00
Antony Antony
4ab47d47af xfrm: extend MIGRATE with UDP encapsulation port
Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
netlink attribute XFRMA_ENCAP.

The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
could go to sleep for a few minutes and wake up. When it wake up the
NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
message to migrate the IPsec SA. The change could be a change UDP
encapsulation port, IP address, or both.

Reported-by: Paul Wouters <pwouters@redhat.com>
Signed-off-by: Antony Antony <antony@phenome.org>
Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-07 08:25:58 +02:00
Hangbin Liu
b81f884a54 xfrm: fix xfrm_dev_event() missing when compile without CONFIG_XFRM_OFFLOAD
In commit d77e38e612 ("xfrm: Add an IPsec hardware offloading API") we
make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
But this will make xfrm_dev_event() missing if we only enable default XFRM
options.

Then if we set down and unregister an interface with IPsec on it. there
will no xfrm_garbage_collect(), which will cause dev usage count hold and
get error like:

unregister_netdevice: waiting for <dev> to become free. Usage count = 4

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-07 08:16:27 +02:00
Antony Antony
a486cd2366 xfrm: fix state migration copy replay sequence numbers
During xfrm migration copy replay and preplay sequence numbers
from the previous state.

Here is a tcpdump output showing the problem.
10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder.
After the migration it sent wrong sequence number, reset to 1.
The migration is from 10.0.0.52 to 10.0.0.53.

IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136
IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136

IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]

IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136

NOTE: next sequence is wrong 0x1

IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136
IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136
IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136

Signed-off-by: Antony Antony <antony@phenome.org>
Reviewed-by: Richard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-05-19 12:49:13 +02:00
Wei Yongjun
24d472e4e4 xfrm: Make function xfrm_dev_register static
Fixes the following sparse warning:

net/xfrm/xfrm_device.c:141:5: warning:
 symbol 'xfrm_dev_register' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-05-19 11:42:39 +02:00
Geliang Tang
a133d93054 xfrm: use memdup_user
Use memdup_user() helper instead of open-coding to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-05-16 07:32:25 +02:00
Ilan Tayari
2c1497bbc8 xfrm: Fix NETDEV_DOWN with IPSec offload
Upon NETDEV_DOWN event, all xfrm_state objects which are bound to
the device are flushed.

The condition for this is wrong, though, testing dev->hw_features
instead of dev->features. If a device has non-user-modifiable
NETIF_F_HW_ESP, then its xfrm_state objects are not flushed,
causing a crash later on after the device is deleted.

Check dev->features instead of dev->hw_features.

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-05-08 09:41:09 +02:00
Sabrina Dubroca
9b3eb54106 xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
that dst. Unfortunately, the code that allocates and fills this copy
doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
passed. In multiple code paths (from raw_sendmsg, from TCP when
replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
passed to xfrm is actually an on-stack flowi4, so we end up reading
stuff from the stack past the end of the flowi4 struct.

Since xfrm_dst->origin isn't used anywhere following commit
ca116922af ("xfrm: Eliminate "fl" and "pol" args to
xfrm_bundle_ok()."), just get rid of it.  xfrm_dst->partner isn't used
either, so get rid of that too.

Fixes: 9d6ec93801 ("ipv4: Use flowi4 in public route lookup interfaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-05-04 07:30:59 +02:00
Linus Torvalds
8d65b08deb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
 "Here are some highlights from the 2065 networking commits that
  happened this development cycle:

   1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

   2) Add a generic XDP driver, so that anyone can test XDP even if they
      lack a networking device whose driver has explicit XDP support
      (me).

   3) Sparc64 now has an eBPF JIT too (me)

   4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
      Starovoitov)

   5) Make netfitler network namespace teardown less expensive (Florian
      Westphal)

   6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

   7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

   8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

   9) Multiqueue support in stmmac driver (Joao Pinto)

  10) Remove TCP timewait recycling, it never really could possibly work
      well in the real world and timestamp randomization really zaps any
      hint of usability this feature had (Soheil Hassas Yeganeh)

  11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
      Aleksandrov)

  12) Add socket busy poll support to epoll (Sridhar Samudrala)

  13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
      and several others)

  14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
  tipc: refactor function tipc_sk_recv_stream()
  tipc: refactor function tipc_sk_recvmsg()
  net: thunderx: Optimize page recycling for XDP
  net: thunderx: Support for XDP header adjustment
  net: thunderx: Add support for XDP_TX
  net: thunderx: Add support for XDP_DROP
  net: thunderx: Add basic XDP support
  net: thunderx: Cleanup receive buffer allocation
  net: thunderx: Optimize CQE_TX handling
  net: thunderx: Optimize RBDR descriptor handling
  net: thunderx: Support for page recycling
  ipx: call ipxitf_put() in ioctl error path
  net: sched: add helpers to handle extended actions
  qed*: Fix issues in the ptp filter config implementation.
  qede: Fix concurrency issue in PTP Tx path processing.
  stmmac: Add support for SIMATIC IOT2000 platform
  net: hns: fix ethtool_get_strings overflow in hns driver
  tcp: fix wraparound issue in tcp_lp
  bpf, arm64: fix jit branch offset related to ldimm64
  bpf, arm64: implement jiting of BPF_XADD
  ...
2017-05-02 16:40:27 -07:00
Linus Torvalds
5a0387a8a8 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.12:

  API:
   - Add batch registration for acomp/scomp
   - Change acomp testing to non-unique compressed result
   - Extend algorithm name limit to 128 bytes
   - Require setkey before accept(2) in algif_aead

  Algorithms:
   - Add support for deflate rfc1950 (zlib)

  Drivers:
   - Add accelerated crct10dif for powerpc
   - Add crc32 in stm32
   - Add sha384/sha512 in ccp
   - Add 3des/gcm(aes) for v5 devices in ccp
   - Add Queue Interface (QI) backend support in caam
   - Add new Exynos RNG driver
   - Add ThunderX ZIP driver
   - Add driver for hardware random generator on MT7623 SoC"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (101 commits)
  crypto: stm32 - Fix OF module alias information
  crypto: algif_aead - Require setkey before accept(2)
  crypto: scomp - add support for deflate rfc1950 (zlib)
  crypto: scomp - allow registration of multiple scomps
  crypto: ccp - Change ISR handler method for a v5 CCP
  crypto: ccp - Change ISR handler method for a v3 CCP
  crypto: crypto4xx - rename ce_ring_contol to ce_ring_control
  crypto: testmgr - Allow ecb(cipher_null) in FIPS mode
  Revert "crypto: arm64/sha - Add constant operand modifier to ASM_EXPORT"
  crypto: ccp - Disable interrupts early on unload
  crypto: ccp - Use only the relevant interrupt bits
  hwrng: mtk - Add driver for hardware random generator on MT7623 SoC
  dt-bindings: hwrng: Add Mediatek hardware random generator bindings
  crypto: crct10dif-vpmsum - Fix missing preempt_disable()
  crypto: testmgr - replace compression known answer test
  crypto: acomp - allow registration of multiple acomps
  hwrng: n2 - Use devm_kcalloc() in n2rng_probe()
  crypto: chcr - Fix error handling related to 'chcr_alloc_shash'
  padata: get_next is never NULL
  crypto: exynos - Add new Exynos RNG driver
  ...
2017-05-02 15:53:46 -07:00
Ilan Tayari
152afb9b45 xfrm: Indicate xfrm_state offload errors
Current code silently ignores driver errors when configuring
IPSec offload xfrm_state, and falls back to host-based crypto.

Fail the xfrm_state creation if the driver has an error, because
the NIC offloading was explicitly requested by the user program.

This will communicate back to the user that there was an error.

Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 14:59:39 -04:00
Sabrina Dubroca
cfcf99f987 xfrm: fix GRO for !CONFIG_NETFILTER
In xfrm_input() when called from GRO, async == 0, and we end up
skipping the processing in xfrm4_transport_finish(). GRO path will
always skip the NF_HOOK, so we don't need the special-case for
!NETFILTER during GRO processing.

Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-27 12:20:19 +02:00
Xin Long
35db069121 xfrm: do the garbage collection after flushing policy
Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.

It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.

This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-26 10:34:32 +02:00
David S. Miller
6b633e82b0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-04-20

This adds the basic infrastructure for IPsec hardware
offloading, it creates a configuration API and adjusts
the packet path.

1) Add the needed netdev features to configure IPsec offloads.

2) Add the IPsec hardware offloading API.

3) Prepare the ESP packet path for hardware offloading.

4) Add gso handlers for esp4 and esp6, this implements
   the software fallback for GSO packets.

5) Add xfrm replay handler functions for offloading.

6) Change ESP to use a synchronous crypto algorithm on
   offloading, we don't have the option for asynchronous
   returns when we handle IPsec at layer2.

7) Add a xfrm validate function to validate_xmit_skb. This
   implements the software fallback for non GSO packets.

8) Set the inner_network and inner_transport members of
   the SKB, as well as encapsulation, to reflect the actual
   positions of these headers, and removes them only once
   encryption is done on the payload.
   From Ilan Tayari.

9) Prepare the ESP GRO codepath for hardware offloading.

10) Fix incorrect null pointer check in esp6.
    From Colin Ian King.

11) Fix for the GSO software fallback path to detect the
    fallback correctly.
    From Ilan Tayari.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 15:11:28 -04:00
Steffen Klassert
bcd1f8a45e xfrm: Prepare the GRO codepath for hardware offloading.
On IPsec hardware offloading, we already get a secpath with
valid state attached when the packet enters the GRO handlers.
So check for hardware offload and skip the state lookup in this
case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:07:49 +02:00
Ilan Tayari
f1bd7d659e xfrm: Add encapsulation header offsets while SKB is not encrypted
Both esp4 and esp6 used to assume that the SKB payload is encrypted
and therefore the inner_network and inner_transport offsets are
not relevant.
When doing crypto offload in the NIC, this is no longer the case
and the NIC driver needs these offsets so it can do TX TCP checksum
offloading.
This patch sets the inner_network and inner_transport members of
the SKB, as well as encapsulation, to reflect the actual positions
of these headers, and removes them only once encryption is done
on the payload.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:07:39 +02:00
Steffen Klassert
f6e27114a6 net: Add a xfrm validate function to validate_xmit_skb
When we do IPsec offloading, we need a fallback for
packets that were targeted to be IPsec offloaded but
rerouted to a device that does not support IPsec offload.
For that we add a function that checks the offloading
features of the sending device and and flags the
requirement of a fallback before it calls the IPsec
output function. The IPsec output function adds the IPsec
trailer and does encryption if needed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:07:28 +02:00
Steffen Klassert
d7dbefc45c xfrm: Add xfrm_replay_overflow functions for offloading
This patch adds functions that handles IPsec sequence
numbers for GSO segments and TSO offloading. We need
to calculate and update the sequence numbers based
on the segments that GSO/TSO will generate. We need
this to keep software and hardware sequence number
counter in sync.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:07:01 +02:00
Steffen Klassert
7862b4058b esp: Add gso handlers for esp4 and esp6
This patch extends the xfrm_type by an encap function pointer
and implements esp4_gso_encap and esp6_gso_encap. These functions
doing the basic esp encapsulation for a GSO packet. In case the
GSO packet needs to be segmented in software, we add gso_segment
functions. This codepath is going to be used on esp hardware
offloads.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:06:50 +02:00
Steffen Klassert
d77e38e612 xfrm: Add an IPsec hardware offloading API
This patch adds all the bits that are needed to do
IPsec hardware offload for IPsec states and ESP packets.
We add xfrmdev_ops to the net_device. xfrmdev_ops has
function pointers that are needed to manage the xfrm
states in the hardware and to do a per packet
offloading decision.

Joint work with:
Ilan Tayari <ilant@mellanox.com>
Guy Shapiro <guysh@mellanox.com>
Yossi Kuperman <yossiku@mellanox.com>

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:06:10 +02:00
Steffen Klassert
21f42cc95f xfrm: Move device notifications to a sepatate file
This is needed for the upcomming IPsec device offloading.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:05:53 +02:00
Steffen Klassert
9d389d7f84 xfrm: Add a xfrm type offload.
We add a struct  xfrm_type_offload so that we have the offloaded
codepath separated to the non offloaded codepath. With this the
non offloade and the offloaded codepath can coexist.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14 10:05:44 +02:00
Johannes Berg
fe52145f91 netlink: pass extended ACK struct where available
This is an add-on to the previous patch that passes the extended ACK
structure where it's already available by existing genl_info or extack
function arguments.

This was done with this spatch (with some manual adjustment of
indentation):

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, info->extack)
...
}

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, info->extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse(A, B, C, D, E, NULL)
+nla_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_validate(A, B, C, D, NULL)
+nlmsg_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate(A, B, C, D, NULL)
+nla_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate_nested(A, B, C, NULL)
+nla_validate_nested(A, B, C, extack)
...>
}

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:22 -04:00
Johannes Berg
fceb6435e8 netlink: pass extended ACK struct to parsing functions
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:22 -04:00
Johannes Berg
2d4bc93368 netlink: extended ACK reporting
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:20 -04:00
David S. Miller
c6606a87db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-04-11

1) Remove unused field from struct xfrm_mgr.

2) Code size optimizations for the xfrm prefix hash and
   address match.

3) Branch optimization for addr4_match.

All patches from Alexey Dobriyan.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 10:10:30 -04:00
Herbert Xu
633439f5b7 xfrm: Prepare for CRYPTO_MAX_ALG_NAME expansion
This patch fixes the xfrm_user code to use the actual array size
rather than the hard-coded CRYPTO_MAX_ALG_NAME length.  This is
because the array size is fixed at 64 bytes while we want to increase
the in-kernel CRYPTO_MAX_ALG_NAME value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-10 19:17:26 +08:00
Linus Torvalds
52b9c81680 Merge branch 'apw' (xfrm_user fixes)
Merge xfrm_user validation fixes from Andy Whitcroft:
 "Two patches we are applying to Ubuntu for XFRM_MSG_NEWAE validation
  issue reported by ZDI.

  The first of these is the primary fix, and the second is for a more
  theoretical issue that Kees pointed out when reviewing the first"

* emailed patches from Andy Whitcroft <apw@canonical.com>:
  xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
  xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
2017-03-29 13:26:22 -07:00
Andy Whitcroft
f843ee6dd0 xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues.  To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29 08:40:15 -07:00
Andy Whitcroft
677e806da4 xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-29 08:40:06 -07:00
Alexey Dobriyan
d7f6946630 xfrm: use "unsigned int" in __xfrm6_pref_hash()
x86_64 is zero-extending arch so "unsigned int" is preferred over "int"
for address calculations.

Space savings:

	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-58 (-58)
	function                                     old     new   delta
	xfrm_hash_resize                            2752    2743      -9
	policy_hash_bysel                            985     973     -12
	policy_hash_direct                          1036     999     -37

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-24 07:03:12 +01:00
Alexey Dobriyan
1560875600 xfrm: remove unused struct xfrm_mgr::id
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-03-24 07:03:12 +01:00
David S. Miller
8474c8caac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2017-03-06

1) Fix lockdep splat on xfrm policy subsystem initialization.
   From Florian Westphal.

2) When using socket policies on IPv4-mapped IPv6 addresses,
   we access the flow informations of the wrong address family
   what leads to an out of bounds access. Fix this by using
   the family we get with the dst_entry, like we do it for the
   standard policy lookup.

3) vti6 can report a PMTU below IPV6_MIN_MTU. Fix this by
   adding a check for that before sending a ICMPV6_PKT_TOOBIG
   message.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 15:00:37 -08:00
Julian Anastasov
1ecc9ad02c xfrm: provide correct dst in xfrm_neigh_lookup
Fix xfrm_neigh_lookup to provide dst->path to the
neigh_lookup dst_ops method.

When skb is provided, the IP address in packet should already
match the dst->path address family. But for the non-skb case,
we should consider the last tunnel address as nexthop address.

Fixes: f894cbf847 ("net: Add optional SKB arg to dst_ops->neigh_lookup().")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-26 21:35:24 -05:00
David S. Miller
99d5ceeea5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-02-16

1) Make struct xfrm_input_afinfo const, nothing writes to it.
   From Florian Westphal.

2) Remove all places that write to the afinfo policy backend
   and make the struct const then.
   From Florian Westphal.

3) Prepare for packet consuming gro callbacks and add
   ESP GRO handlers. ESP packets can be decapsulated
   at the GRO layer then. It saves a round through
   the stack for each ESP packet.

Please note that this has a merge coflict between commit

63fca65d08 ("net: add confirm_neigh method to dst_ops")

from net-next and

3d7d25a68e ("xfrm: policy: remove garbage_collect callback")
a2817d8b27 ("xfrm: policy: remove family field")

from ipsec-next.

The conflict can be solved as it is done in linux-next.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-16 21:25:49 -05:00
Steffen Klassert
7785bba299 esp: Add a software GRO codepath
This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 11:04:11 +01:00
Steffen Klassert
54ef207ac8 xfrm: Extend the sec_path for IPsec offloading
We need to keep per packet offloading informations across
the layers. So we extend the sec_path to carry these for
the input and output offload codepath.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 11:04:10 +01:00
Steffen Klassert
1e29537034 xfrm: Export xfrm_parse_spi.
We need it in the ESP offload handlers, so export it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:49 +01:00
Steffen Klassert
25393d3fc0 net: Prepare gro for packet consuming gro callbacks
The upcomming IPsec ESP gro callbacks will consume the skb,
so prepare for that.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:44 +01:00
Steffen Klassert
b0fcee825c xfrm: Add a secpath_set helper.
Add a new helper to set the secpath to the skb.
This avoids code duplication, as this is used
in multiple places.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-15 09:39:24 +01:00
Steffen Klassert
4c86d77743 xfrm: Don't use sk_family for socket policy lookups
On IPv4-mapped IPv6 addresses sk_family is AF_INET6,
but the flow informations are created based on AF_INET.
So the routing set up 'struct flowi4' but we try to
access 'struct flowi6' what leads to an out of bounds
access. Fix this by using the family we get with the
dst_entry, like we do it for the standard policy lookup.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-14 12:34:30 +01:00
Florian Westphal
37b103830e xfrm: policy: make policy backend const
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:19 +01:00
Florian Westphal
bdba9fe01e xfrm: policy: remove xfrm_policy_put_afinfo
Alternative is to keep it an make the (unused) afinfo arg const to avoid
the compiler warnings once the afinfo structs get constified.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
a2817d8b27 xfrm: policy: remove family field
Only needed it to register the policy backend at init time.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
3d7d25a68e xfrm: policy: remove garbage_collect callback
Just call xfrm_garbage_collect_deferred() directly.
This gets rid of a write to afinfo in register/unregister and allows to
constify afinfo later on.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:18 +01:00
Florian Westphal
2b61997aa0 xfrm: policy: xfrm_policy_unregister_afinfo can return void
Nothing checks the return value. Also, the errors returned on unregister
are impossible (we only support INET and INET6, so no way
xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo'
itself).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Florian Westphal
f5e2bb4f5b xfrm: policy: xfrm_get_tos cannot fail
The comment makes it look like get_tos() is used to validate something,
but it turns out the comment was about xfrm_find_bundle() which got removed
years ago.

xfrm_get_tos will return either the tos (ipv4) or 0 (ipv6).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Florian Westphal
960fdfdeb9 xfrm: input: constify xfrm_input_afinfo
Nothing writes to these structures (the module owner was not used).

While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
(INET6), not AF_MAX.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 10:22:17 +01:00
Florian Westphal
c282222a45 xfrm: policy: init locks early
Dmitry reports following splat:
 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
[..]
 spin_lock_bh include/linux/spinlock.h:304 [inline]
 xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
 ops_init+0x10a/0x530 net/core/net_namespace.c:115
 setup_net+0x2ed/0x690 net/core/net_namespace.c:291
 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
 SYSC_unshare kernel/fork.c:2281 [inline]

Problem is that when we get error during xfrm_net_init we will call
xfrm_policy_fini which will acquire xfrm_policy_lock before it was
initialized.  Just move it around so locks get set up first.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 283bc9f35b ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-02-09 09:52:28 +01:00
Eric Dumazet
97e219b7c1 gro_cells: move to net/core/gro_cells.c
We have many gro cells users, so lets move the code to avoid
duplication.

This creates a CONFIG_GRO_CELLS option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-08 14:38:18 -05:00
Julian Anastasov
63fca65d08 net: add confirm_neigh method to dst_ops
Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

For XFRM prefer the last tunnel address, if present. With help
from Steffen Klassert.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 13:07:46 -05:00
Steffen Klassert
1995876a06 xfrm: Add a dummy network device for napi.
This patch adds a dummy network device so that we can
use gro_cells for IPsec GRO. With this, we handle IPsec
GRO with no impact on the generic networking code.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-30 06:45:43 +01:00
Florian Westphal
3819a35fdb xfrm: fix possible null deref in xfrm_init_tempstate
Dan reports following smatch warning:
 net/xfrm/xfrm_state.c:659
 error: we previously assumed 'afinfo' could be null (see line 651)

 649  struct xfrm_state_afinfo *afinfo = xfrm_state_afinfo_get_rcu(family);
 651  if (afinfo)
		...
 658  }
 659  afinfo->init_temprop(x, tmpl, daddr, saddr);

I am resonably sure afinfo cannot be NULL here.

xfrm_state4.c and state6.c are both part of ipv4/ipv6 (depends on
CONFIG_XFRM, a boolean) but even if ipv6 is a module state6.c can't
be removed (ipv6 lacks module_exit so it cannot be removed).

The only callers for xfrm6_fini that leads to state backend unregister
are error unwinding paths that can be called during ipv6 init function.

So after ipv6 module is loaded successfully the state backend cannot go
away anymore.

The family value from policy lookup path is taken from dst_entry, so
that should always be AF_INET(6).

However, since this silences the warning and avoids readers of this
code wondering about possible null deref it seems preferrable to
be defensive and just add the old check back.

Fixes: 711059b975 ("xfrm: add and use xfrm_state_afinfo_get_rcu")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-16 08:36:33 +01:00
Florian Westphal
75cda62d9c xfrm: state: simplify rcu_read_unlock handling in two spots
Instead of:
  if (foo) {
      unlock();
      return bar();
   }
   unlock();
do:
   unlock();
   if (foo)
       return bar();

This is ok because rcu protected structure is only dereferenced before
the conditional.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:14 +01:00
Florian Westphal
711059b975 xfrm: add and use xfrm_state_afinfo_get_rcu
xfrm_init_tempstate is always called from within rcu read side section.
We can thus use a simpler function that doesn't call rcu_read_lock
again.

While at it, also make xfrm_init_tempstate return value void, the
return value was never tested.

A followup patch will replace remaining callers of xfrm_state_get_afinfo
with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
get_afinfo interface.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
af5d27c4e1 xfrm: remove xfrm_state_put_afinfo
commit 44abdc3047
("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
xfrm_state_put_afinfo equivalent to rcu_read_unlock.

Use spatch to replace it with direct calls to rcu_read_unlock:

@@
struct xfrm_state_afinfo *a;
@@

-  xfrm_state_put_afinfo(a);
+  rcu_read_unlock();

old:
 text    data     bss     dec     hex filename
22570      72     424   23066    5a1a xfrm_state.o
 1612       0       0    1612     64c xfrm_output.o
new:
22554      72     424   23050    5a0a xfrm_state.o
 1596       0       0    1596     63c xfrm_output.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
423826a7b1 xfrm: avoid rcu sparse warning
xfrm/xfrm_state.c:1973:21: error: incompatible types in comparison expression (different address spaces)

Harmless, but lets fix it to reduce the noise.

While at it, get rid of unneeded NULL check, its never hit:

net/ipv4/xfrm4_state.c: xfrm_state_register_afinfo(&xfrm4_state_afinfo);
net/ipv6/xfrm6_state.c: return xfrm_state_register_afinfo(&xfrm6_state_afinfo);
net/ipv6/xfrm6_state.c: xfrm_state_unregister_afinfo(&xfrm6_state_afinfo);

... are the only callsites.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal
b3b73b8e6d xfrm: state: do not acquire lock in get_mtu helpers
Once flow cache gets removed the mtu initialisation happens for every skb
that gets an xfrm attached, so this lock starts to show up in perf.

It is not obvious why this lock is required -- the caller holds
reference on the state struct, type->destructor is only called from the
state gc worker (all state structs on gc list must have refcount 0).

xfrm_init_state already has been called (else private data accessed
by type->get_mtu() would not be set up).

So just remove the lock -- the race on the state (DEAD?) doesn't
matter (could change right after dropping the lock too).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-06 08:44:56 +01:00
Alexander Alemayhu
1365e547c6 xfrm: trivial typos
o s/descentant/descendant
o s/workarbound/workaround

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-04 06:49:28 +01:00
Thomas Gleixner
8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds
e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
David S. Miller
2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
Yi Zhao
83e2d0587a xfrm_user: fix return value from xfrm_user_rcv_msg
It doesn't support to run 32bit 'ip' to set xfrm objdect on 64bit host.
But the return value is unknown for user program:

ip xfrm policy list
RTNETLINK answers: Unknown error 524

Replace ENOTSUPP with EOPNOTSUPP:

ip xfrm policy list
RTNETLINK answers: Operation not supported

Signed-off-by: Yi Zhao <yi.zhao@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-11-30 10:58:53 +01:00
Florian Westphal
330e832abd xfrm: unbreak xfrm_sk_policy_lookup
if we succeed grabbing the refcount, then
  if (err && !xfrm_pol_hold_rcu)

will evaluate to false so this hits last else branch which then
sets policy to ERR_PTR(0).

Fixes: ae33786f73 ("xfrm: policy: only use rcu in xfrm_sk_policy_lookup")
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-11-18 07:00:05 +01:00
Sebastian Andrzej Siewior
a4fc1bfc42 net/flowcache: Convert to hotplug state machine
Install the callbacks via the state machine. Use multi state support to avoid
custom list handling for the multiple instances.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20161103145021.28528-10-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:28 +01:00
David S. Miller
1d00578836 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2016-10-25

Just a leftover from the last development cycle.

1) Remove some unused code, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-28 13:26:27 -04:00
Florian Westphal
2258d927a6 xfrm: remove unused helper
Not used anymore since 2009 (9e0d57fd6d,
'xfrm: SAD entries do not expire correctly after suspend-resume').

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-09-30 08:20:56 +02:00
Jia He
07613873f1 proc: Reduce cache miss in xfrm_statistics_seq_show
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.

Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30 01:50:45 -04:00
David S. Miller
1678c1134f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2016-09-23

Only two patches this time:

1) Fix a comment reference to struct xfrm_replay_state_esn.
   From Richard Guy Briggs.

2) Convert xfrm_state_lookup to rcu, we don't need the
   xfrm_state_lock anymore in the input path.
   From Florian Westphal.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24 08:18:19 -04:00
David S. Miller
d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Florian Westphal
c2f672fc94 xfrm: state lookup can be lockless
This is called from the packet input path, we get lock contention
if many cpus handle ipsec in parallel.

After recent rcu conversion it is safe to call __xfrm_state_lookup
without the spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-09-21 12:37:29 +02:00
Ilan Tayari
b588479358 xfrm: Fix memory leak of aead algorithm name
commit 1a6509d991 ("[IPSEC]: Add support for combined mode algorithms")
introduced aead. The function attach_aead kmemdup()s the algorithm
name during xfrm_state_construct().
However this memory is never freed.
Implementation has since been slightly modified in
commit ee5c23176f ("xfrm: Clone states properly on migration")
without resolving this leak.
This patch adds a kfree() call for the aead algorithm name.

Fixes: 1a6509d991 ("[IPSEC]: Add support for combined mode algorithms")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-09-19 12:08:58 +02:00
David S. Miller
b20b378d49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mediatek/mtk_eth_soc.c
	drivers/net/ethernet/qlogic/qed/qed_dcbx.c
	drivers/net/phy/Kconfig

All conflicts were cases of overlapping commits.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12 15:52:44 -07:00
Javier Martinez Canillas
65b323e2ff xfrm: use IS_ENABLED() instead of checking for built-in or module
The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
built-in or as a module, use that macro instead of open coding the same.

Using the macro makes the code more readable by helping abstract away some
of the Kconfig built-in and module enable details.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:19:11 -07:00
Mathias Krause
2f30ea5090 xfrm_user: propagate sec ctx allocation errors
When we fail to attach the security context in xfrm_state_construct()
we'll return 0 as error value which, in turn, will wrongly claim success
to userland when, in fact, we won't be adding / updating the XFRM state.

This is a regression introduced by commit fd21150a0f ("[XFRM] netlink:
Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()").

Fix it by propagating the error returned by security_xfrm_state_alloc()
in this case.

Fixes: fd21150a0f ("[XFRM] netlink: Inline attach_encap_tmpl()...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-09-09 09:02:08 +02:00
David S. Miller
575f9c43e7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
ipsec-next 2016-09-08

1) Constify the xfrm_replay structures. From Julia Lawall

2) Protect xfrm state hash tables with rcu, lookups
   can be done now without acquiring xfrm_state_lock.
   From Florian Westphal.

3) Protect xfrm policy hash tables with rcu, lookups
   can be done now without acquiring xfrm_policy_lock.
   From Florian Westphal.

4) We don't need to have a garbage collector list per
   namespace anymore, so use a global one instead.
   From Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 13:09:41 -07:00
subashab@codeaurora.org
0f76d25644 net: xfrm: Change u32 sysctl entries to use proc_douintvec
proc_dointvec limits the values to INT_MAX in u32 sysctl entries.
proc_douintvec allows to write upto UINT_MAX.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-07 23:17:53 -07:00
Florian Westphal
35db57bbc4 xfrm: state: remove per-netns gc task
After commit 5b8ef3415a
("xfrm: Remove ancient sleeping when the SA is in acquire state")
gc does not need any per-netns data anymore.

As far as gc is concerned all state structs are the same, so we
can use a global work struct for it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-24 13:16:06 +02:00
Steffen Klassert
4141b36ab1 xfrm: Fix xfrm_policy_lock imbalance
An earlier patch accidentally replaced a write_lock_bh
with a spin_unlock_bh. Fix this by using spin_lock_bh
instead.

Fixes: 9d0380df62 ("xfrm: policy: convert policy_lock to spinlock")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-24 13:13:08 +02:00
Florian Westphal
9d0380df62 xfrm: policy: convert policy_lock to spinlock
After earlier patches conversions all spots acquire the writer lock and
we can now convert this to a normal spinlock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:12 +02:00
Florian Westphal
d5b8f86dc7 xfrm: policy: don't acquire policy lock in xfrm_spd_getinfo
It doesn't seem that important.

We now get inconsistent view of the counters, but those are stale anyway
right after we drop the lock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:12 +02:00
Florian Westphal
ae33786f73 xfrm: policy: only use rcu in xfrm_sk_policy_lookup
Don't acquire the readlock anymore and rely on rcu alone.

In case writer on other CPU changed policy at the wrong moment (after we
obtained sk policy pointer but before we could obtain the reference)
just repeat the lookup.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:12 +02:00
Florian Westphal
a7c44247f7 xfrm: policy: make xfrm_policy_lookup_bytype lockless
side effect: no longer disables BH (should be fine).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:12 +02:00
Florian Westphal
e37cc8ade5 xfrm: policy: use atomic_inc_not_zero in rcu section
If we don't hold the policy lock anymore the refcnt might
already be 0, i.e. policy struct is about to be free'd.

Switch to atomic_inc_not_zero to avoid this.

On removal policies are already unlinked from the tables (lists)
before the last _put occurs so we are not supposed to find the same
'dead' entry on the next loop, so its safe to just repeat the lookup.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:11 +02:00
Florian Westphal
30846090a7 xfrm: policy: add sequence count to sync with hash resize
Once xfrm_policy_lookup_bytype doesn't grab xfrm_policy_lock anymore its
possible for a hash resize to occur in parallel.

Use sequence counter to block lookup in case a resize is in
progress and to also re-lookup in case hash table was altered
in the mean time (might cause use to not find the best-match).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:11 +02:00
Florian Westphal
e1e551bc56 xfrm: policy: prepare policy_bydst hash for rcu lookups
Since commit 56f047305d
("xfrm: add rcu grace period in xfrm_policy_destroy()") xfrm policy
objects are already free'd via rcu.

In order to make more places lockless (i.e. use rcu_read_lock instead of
grabbing read-side of policy rwlock) we only need to:

- use rcu_assign_pointer to store address of new hash table backend memory
- add rcu barrier so that freeing of old memory is delayed (expansion
  and free happens from system workqueue, so synchronize_rcu is fine)
- use rcu_dereference to fetch current address of the hash table.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:11 +02:00
Florian Westphal
a5eefc1df6 xfrm: policy: use rcu versions for iteration and list add/del
This is required once we allow lockless readers.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-08-12 08:07:11 +02:00