Commit graph

93 commits

Author SHA1 Message Date
Andreas Gruenbacher
3337ab08d0 iov_iter: Introduce nofault flag to disable page faults
Introduce a new nofault flag to indicate to iov_iter_get_pages not to
fault in user pages.

This is implemented by passing the FOLL_NOFAULT flag to get_user_pages,
which causes get_user_pages to fail when it would otherwise fault in a
page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting
in pages when page faults are not allowed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-24 15:26:06 +02:00
Andreas Gruenbacher
cdd591fc86 iov_iter: Introduce fault_in_iov_iter_writeable
Introduce a new fault_in_iov_iter_writeable helper for safely faulting
in an iterator for writing.  Uses get_user_pages() to fault in the pages
without actually writing to them, which would be destructive.

We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that
the iterator passed to .read_iter isn't in memory.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20 19:33:07 +02:00
Andreas Gruenbacher
a6294593e8 iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
Turn iov_iter_fault_in_readable into a function that returns the number
of bytes not faulted in, similar to copy_to_user, instead of returning a
non-zero value when any of the requested pages couldn't be faulted in.
This supports the existing users that require all pages to be faulted in
as well as new users that are happy if any pages can be faulted in.

Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
sure this change doesn't silently break things.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18 16:35:06 +02:00
Jens Axboe
7dedd3e180 Revert "iov_iter: track truncated size"
This reverts commit 2112ff5ce0.

We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15 09:22:35 -06:00
Jens Axboe
8fb0f47a9d iov_iter: add helper to save iov_iter state
In an ideal world, when someone is passed an iov_iter and returns X bytes,
then X bytes would have been consumed/advanced from the iov_iter. But we
have use cases that always consume the entire iterator, a few examples
of that are iomap and bdev O_DIRECT. This means we cannot rely on the
state of the iov_iter once we've called ->read_iter() or ->write_iter().

This would be easier if we didn't always have to deal with truncate of
the iov_iter, as rewinding would be trivial without that. We recently
added a commit to track the truncate state, but that grew the iov_iter
by 8 bytes and wasn't the best solution.

Implement a helper to save enough of the iov_iter state to sanely restore
it after we've called the read/write iterator helpers. This currently
only works for IOVEC/BVEC/KVEC as that's all we need, support for other
iterator types are left as an exercise for the reader.

Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14 08:12:18 -06:00
Pavel Begunkov
2112ff5ce0 iov_iter: track truncated size
Remember how many bytes were truncated and reverted back. Because
not reexpanded iterators don't always work well with reverting, we may
need to know that to reexpand ourselves when needed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-09-03 19:31:33 -04:00
Al Viro
f0b65f39ac iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it.  In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed.  That, however, needs to be done only on slow
paths.

All in-tree callers converted.  And that kills the last user of iterate_all_kinds()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:14 -04:00
Al Viro
8409a0d261 sanitize iov_iter_fault_in_readable()
1) constify iov_iter argument; we are not advancing it in this primitive.

2) cap the amount requested by the amount of data in iov_iter.  All
existing callers should've been safe, but the check is really cheap and
doing it here makes for easier analysis, as well as more consistent
semantics among the primitives.

3) don't bother with iterate_iovec().  Explicit loop is not any harder
to follow, and we get rid of standalone iterate_iovec() users - it's
only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:10 -04:00
Al Viro
8cd54c1c84 iov_iter: separate direction from flavour
Instead of having them mixed in iter->type, use separate ->iter_type
and ->data_source (u8 and bool resp.)  And don't bother with (pseudo-)
bitmap for the former - microoptimizations from being able to check
if the flavour is one of two values are not worth the confusion for
optimizer.  It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE,
so we end up with extra headache.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:08 -04:00
Al Viro
4b6c132b7d iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert()
Use corresponding plain variants, revert on short copy.  That's the way it
should've been done from the very beginning, except that we didn't have
iov_iter_revert() back then...

[fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:44:23 -04:00
David Howells
66cd071a1f iov_iter: Remove iov_iter_for_each_range()
Remove iov_iter_for_each_range() as it's no longer used with the removal of
lustre.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03 10:36:49 -04:00
David Howells
3d14ec1fe6 iov_iter: Four fixes for ITER_XARRAY
Fix four things[1] in the patch that adds ITER_XARRAY[2]:

 (1) Remove the address_space struct predeclaration.  This is a holdover
     from when it was ITER_MAPPING.

 (2) Fix _copy_mc_to_iter() so that the xarray segment updates count and
     iov_offset in the iterator before returning.

 (3) Fix iov_iter_alignment() to not loop in the xarray case.  Because the
     middle pages are all whole pages, only the end pages need be
     considered - and this can be reduced to just looking at the start
     position in the xarray and the iteration size.

 (4) Fix iov_iter_advance() to limit the size of the advance to no more
     than the remaining iteration size.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Link: https://lore.kernel.org/r/YIVrJT8GwLI0Wlgx@zeniv-ca.linux.org.uk [1]
Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk [2]
2021-04-26 22:55:12 +01:00
David Howells
7ff5062079 iov_iter: Add ITER_XARRAY
Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes.  The iterator supports transparent huge pages.

The iterate_xarray() macro calls the helper function with rcu_access()
helped.  I think that this is only a problem for iov_iter_for_each_range()
- and that returns an error for ITER_XARRAY (also, this function does not
appear to be called).

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

Note that older versions of this patch implemented an ITER_MAPPING instead,
which was almost the same.

Changes:
v7:
 - Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1].

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Christoph Hellwig <hch@lst.de>
cc: linux-mm@kvack.org
cc: linux-cachefs@redhat.com
cc: linux-afs@lists.infradead.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789064740.6155.11932541175173658065.stgit@warthog.procyon.org.uk/ # v6
Link: https://lore.kernel.org/r/27c369a8f42bb8a617672b2dc0126a5c6df5a050.camel@kernel.org [1]
2021-04-23 09:15:32 +01:00
Willem de Bruijn
52cbd23a11 udp: fix skb_copy_and_csum_datagram with odd segment sizes
When iteratively computing a checksum with csum_block_add, track the
offset "pos" to correctly rotate in csum_block_add when offset is odd.

The open coded implementation of skb_copy_and_csum_datagram did this.
With the switch to __skb_datagram_iter calling csum_and_copy_to_iter,
pos was reinitialized to 0 on each call.

Bring back the pos by passing it along with the csum to the callback.

Changes v1->v2
  - pass csum value, instead of csump pointer (Alexander Duyck)

Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
Fixes: 950fcaecd5 ("datagram: consolidate datagram copy to iter helpers")
Reported-by: Oliver Graute <oliver.graute@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:56:56 -08:00
Linus Torvalds
85ed13e78d Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat iovec cleanups from Al Viro:
 "Christoph's series around import_iovec() and compat variant thereof"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  security/keys: remove compat_keyctl_instantiate_key_iov
  mm: remove compat_process_vm_{readv,writev}
  fs: remove compat_sys_vmsplice
  fs: remove the compat readv/writev syscalls
  fs: remove various compat readv/writev helpers
  iov_iter: transparently handle compat iovecs in import_iovec
  iov_iter: refactor rw_copy_check_uvector and import_iovec
  iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
  compat.h: fix a spelling error in <linux/compat.h>
2020-10-12 16:35:51 -07:00
Dan Williams
ec6347bb43 x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

  On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
  >
  > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
  > >
  > > However now I see that copy_user_generic() works for the wrong reason.
  > > It works because the exception on the source address due to poison
  > > looks no different than a write fault on the user address to the
  > > caller, it's still just a short copy. So it makes copy_to_user() work
  > > for the wrong reason relative to the name.
  >
  > Right.
  >
  > And it won't work that way on other architectures. On x86, we have a
  > generic function that can take faults on either side, and we use it
  > for both cases (and for the "in_user" case too), but that's an
  > artifact of the architecture oddity.
  >
  > In fact, it's probably wrong even on x86 - because it can hide bugs -
  > but writing those things is painful enough that everybody prefers
  > having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

 [ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-06 11:18:04 +02:00
Christoph Hellwig
89cd35c58b iov_iter: transparently handle compat iovecs in import_iovec
Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03 00:02:13 -04:00
Christoph Hellwig
bfdc59701d iov_iter: refactor rw_copy_check_uvector and import_iovec
Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03 00:01:56 -04:00
Herbert Xu
7999096fa9 iov_iter: Move unnecessary inclusion of crypto/hash.h
The header file linux/uio.h includes crypto/hash.h which pulls in
most of the Crypto API.  Since linux/uio.h is used throughout the
kernel this means that every tiny bit of change to the Crypto API
causes the entire kernel to get rebuilt.

This patch fixes this by moving it into lib/iov_iter.c instead
where it is actually used.

This patch also fixes the ifdef to use CRYPTO_HASH instead of just
CRYPTO which does not guarantee the existence of ahash.

Unfortunately a number of drivers were relying on linux/uio.h to
provide access to linux/slab.h.  This patch adds inclusions of
linux/slab.h as detected by build failures.

Also skbuff.h was relying on this to provide a declaration for
ahash_request.  This patch adds a forward declaration instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-30 09:34:23 -04:00
David Howells
8cefc107ca pipe: Use head and tail pointers for the ring, not cursor and length
Convert pipes to use head and tail pointers for the buffer ring rather than
pointer and length as the latter requires two atomic ops to update (or a
combined op) whereas the former only requires one.

 (1) The head pointer is the point at which production occurs and points to
     the slot in which the next buffer will be placed.  This is equivalent
     to pipe->curbuf + pipe->nrbufs.

     The head pointer belongs to the write-side.

 (2) The tail pointer is the point at which consumption occurs.  It points
     to the next slot to be consumed.  This is equivalent to pipe->curbuf.

     The tail pointer belongs to the read-side.

 (3) head and tail are allowed to run to UINT_MAX and wrap naturally.  They
     are only masked off when the array is being accessed, e.g.:

	pipe->bufs[head & mask]

     This means that it is not necessary to have a dead slot in the ring as
     head == tail isn't ambiguous.

 (4) The ring is empty if "head == tail".

     A helper, pipe_empty(), is provided for this.

 (5) The occupancy of the ring is "head - tail".

     A helper, pipe_occupancy(), is provided for this.

 (6) The number of free slots in the ring is "pipe->ring_size - occupancy".

     A helper, pipe_space_for_user() is provided to indicate how many slots
     userspace may use.

 (7) The ring is full if "head - tail >= pipe->ring_size".

     A helper, pipe_full(), is provided for this.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-10-31 15:12:34 +00:00
Linus Torvalds
a2d79c7174 for-5.3/io_uring-20190711
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0nUl4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplCaEACa7c7ybRgV6CZpS9PXXVqBJqYILRLBXsnn
 MXomrSdjKMs0Y928RAQHFNWh3HaHHtdvSmFvwOpfF5lyrYXVfc9MQ7brqarDp1t2
 f4jvkL63BVG2Zs/VL8QVAz+CwtCF39hduzUR/Y9j/4+rJNhSBMNJLY0nlB5weCTy
 MJetnLoQ9ETA2+xu49vAM/PFJgBynNUAyUer918y8QysJRj90/VnhieQmrVb4tpG
 Q4yZFKq4YPDs0tLEX4Nj6eJERcyW/4MC2oZ0aPXU4g2Dc3SVWaSNOo5WpkP+crGt
 0dbyLmhomteE6+Kaco1hAWIkG/RuvgiMzDizryi0enXP51edV3Vnwyg3MQSUhcnf
 Pn7vrDkajKBE9rFGlLy8V4gkKdS8XJQy2xA1MWm3aWgGl4v0j64EXIe0IhIK30vU
 25A9jLDcdgr74+Lw+vWLLd+oeGD0iFf6wiEp+3jzEdtfVNE/lD6yilTzbdz2V0UK
 8T1sRLMEkaG7CbxOVc1UAfcvObjuqQihEI0fQvl4yxV178h8mtWB87YmV2S2EhzP
 v6FSxiC1yZ7J+rwb/Mff7+1GoOgzrpS/zESk2WMTgcwVdiwFfv5eIC26ZNWObJ/x
 IY+4xRgTf2dEsjBeumOuBzxTfzrZb+pTO4GCa4O+t0UDQRIwl0y20pTXKtxU3y/U
 gKPXEjgXrQ==
 =jDiB
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3/io_uring-20190711' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:
 "This contains:

   - Support for recvmsg/sendmsg as first class opcodes.

     I don't envision going much further down this path, as there are
     plans in progress to support potentially any system call in an
     async fashion through io_uring. But I think it does make sense to
     have certain core ops available directly, especially those that can
     support a "try this non-blocking" flag/mode. (me)

   - Handle generic short reads automatically.

     This can happen fairly easily if parts of the buffered read is
     cached. Since the application needs to issue another request for
     the remainder, just do this internally and save kernel/user
     roundtrip while providing a nicer more robust API. (me)

   - Support for linked SQEs.

     This allows SQEs to depend on each other, enabling an application
     to eg queue a read-from-this-file,write-to-that-file pair. (me)

   - Fix race in stopping SQ thread (Jackie)"

* tag 'for-5.3/io_uring-20190711' of git://git.kernel.dk/linux-block:
  io_uring: fix io_sq_thread_stop running in front of io_sq_thread
  io_uring: add support for recvmsg()
  io_uring: add support for sendmsg()
  io_uring: add support for sqe links
  io_uring: punt short reads to async context
  uio: make import_iovec()/compat_import_iovec() return bytes on success
2019-07-13 10:36:53 -07:00
Christoph Hellwig
b620743077 block: never take page references for ITER_BVEC
If we pass pages through an iov_iter we always already have a reference
in the caller.  Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:32 -06:00
Jens Axboe
87e5e6dab6 uio: make import_iovec()/compat_import_iovec() return bytes on success
Currently these functions return < 0 on error, and 0 for success.
Change that so that we return < 0 on error, but number of bytes
for success.

Some callers already treat the return value that way, others need a
slight tweak.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-31 15:30:03 -06:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Ming Lei
f5eb4d3b92 iov_iter: fix iov_iter_type
Commit 875f1d0769 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag
is stored into iter->type.

However, iov_iter_type() doesn't consider the new added flag, fix
it by masking this flag in iov_iter_type().

Fixes: 875f1d0769 ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01 08:38:47 -06:00
Linus Torvalds
1bdd3dbfff io_uring-20190323
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyWVysQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn5lD/0bEg76kbuwOUy5+FDqOpF0MNOU7xZcYcsI
 YkkaKkUi2YQL6NJlkU7AhtPwep+J2sgSnDW9Ho9WIXbsnsO6UF79uIdcix6zJGIl
 WnZZ3BLgWeciCfrzFpn3FFZnm/AKJSPWPmllUFvmUYT9GdRgN4ZnHBsS1HTlJ1m5
 5HhwLtaYOsZ75NxWBRqWspmtFe+XZ/CrjGgmvIF8FjSuIP2q0RrOmCF1XAA82umd
 ehiU1ZtQ+v4FHxmJWjzMWhrCj2y0gmPb+DotIqefFjVnd/G+LrFGMD1fsLoQVFDy
 L5VzSOGj1E4KXfDpIeGnz/08dpqXmOkvsSaNnv1U7vA7SCkbodR/BA1EKJrvk5v7
 MGkkcQDaU/WzC41RCyVQNWAWjzNLKbruXQ+1HqCx5eh7uthvMQMXDvGf4Jgeq+/E
 vGzrEKZ6qI78Vy0mXSy4dfFbFaNTjCkE2jbIG7BQx5zdtnS9/VPXNkpZxPrGLM+P
 /fTsLXghU9lKn6WHVtLpQsfJr0OMjyC9JA23pTX2G9MtBhDcyuRs+uCeQgG6cIkl
 F15LGuOY7YGYxRsegdinFaoldnHersUDx19c+uFdrB0k0A/A6KeGHuZx7aJPkW1L
 M89FkyJr2ZBgc26PvKz6j1Hwl2MKJC5h8TpPES/QnulWh4FbqqH3a501Qa1AQuxC
 1me95iy74w==
 =l4lx
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-block

Pull io_uring fixes and improvements from Jens Axboe:
 "The first five in this series are heavily inspired by the work Al did
  on the aio side to fix the races there.

  The last two re-introduce a feature that was in io_uring before it got
  merged, but which I pulled since we didn't have a good way to have
  BVEC iters that already have a stable reference. These aren't
  necessarily related to block, it's just how io_uring pins fixed
  buffers"

* tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
  block: add BIO_NO_PAGE_REF flag
  iov_iter: add ITER_BVEC_FLAG_NO_REF flag
  io_uring: mark me as the maintainer
  io_uring: retry bulk slab allocs as single allocs
  io_uring: fix poll races
  io_uring: fix fget/fput handling
  io_uring: add prepped flag
  io_uring: make io_read/write return an integer
  io_uring: use regular request ref counts
2019-03-23 10:25:12 -07:00
Jens Axboe
875f1d0769 iov_iter: add ITER_BVEC_FLAG_NO_REF flag
For ITER_BVEC, if we're holding on to kernel pages, the caller
doesn't need to grab a reference to the bvec pages, and drop that
same reference on IO completion. This is essentially safe for any
ITER_BVEC, but some use cases end up reusing pages and uncondtionally
dropping a page reference on completion. And example of that is
sendfile(2), that ends up being a splice_in + splice_out on the
pipe pages.

Add a flag that tells us it's fine to not grab a page reference
to the bvec pages, since that caller knows not to drop a reference
when it's done with the pages.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-18 10:44:48 -06:00
Christoph Hellwig
77000bc43d uio: remove the unused iov_for_each macro
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-04 10:59:50 -05:00
Sagi Grimberg
d05f443554 iov_iter: introduce hash_and_copy_to_iter helper
Allow consumers that want to use iov iterator helpers and also update
a predefined hash calculation online when copying data. This is useful
when copying incoming network buffers to a local iterator and calculate
a digest on the incoming stream. nvme-tcp host driver that will be
introduced in following patches is the first consumer via
skb_copy_and_hash_datagram_iter.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:54 +01:00
Sagi Grimberg
cb002d074d iov_iter: pass void csum pointer to csum_and_copy_to_iter
The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram
and we are trying to unite its logic with skb_copy_datagram_iter by passing
a callback to the copy function that we want to apply. Thus, we need
to make the checksum pointer private to the function.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 09:58:53 +01:00
David Howells
9ea9ce0427 iov_iter: Add I/O discard iterator
Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
just discards any data copied to it.

This is useful in a network filesystem for discarding any unwanted data
sent by a server.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
aa563d7bca iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements.  This makes it easier to add further
iterator types.  Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself.  Only the direction is required.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
00e2370744 iov_iter: Use accessor function
Use accessor functions to access an iterator's type and direction.  This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:40:44 +01:00
Dave Jiang
dfb06cba8c uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()
copy_to_iter_mcsafe() is passing in the is_source parameter as "false"
to check_copy_size(). This is different than what copy_to_iter() does.
Also, the addr parameter passed to check_copy_size() is the source so
therefore we should be passing in "true" instead.

Fixes: 8780356ef6 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Cc: <stable@vger.kernel.org>
Reported-by: Fan Du <fan.du@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-09-12 14:58:47 -07:00
Dan Williams
522239b445 uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
Add a common Kconfig CONFIG_ARCH_HAS_UACCESS_MCSAFE that archs can
optionally select, and fixup the declaration of _copy_to_iter_mcsafe().

Fixes: 8780356ef6 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22 23:17:03 -07:00
Dan Williams
8780356ef6 x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()
Use the updated memcpy_mcsafe() implementation to define
copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
difference from typical copy_to_iter() is that the ITER_KVEC and
ITER_BVEC iterator types can fail to complete a full transfer.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15 08:32:42 +02:00
Al Viro
09cf698a59 new primitive: iov_iter_for_each_range()
For kvec and bvec: feeds segments to given callback as long as it
returns 0.  For iovec and pipe: fails.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-11 22:36:54 -04:00
Al Viro
faea13297e kill iov_shorten()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-11 17:23:43 -04:00
Al Viro
c43aeb1980 fix brown paperbag bug in inlined copy_..._iter()
"copied nothing" == "return 0", not "return full size".

Fixes: aa28de275a "iov_iter/hardening: move object size checks to inlined part"
Spotted-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-10 07:40:49 -04:00
Linus Torvalds
6a37e94009 Merge branch 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter hardening from Al Viro:
 "This is the iov_iter/uaccess/hardening pile.

  For one thing, it trims the inline part of copy_to_user/copy_from_user
  to the minimum that *does* need to be inlined - object size checks,
  basically. For another, it sanitizes the checks for iov_iter
  primitives. There are 4 groups of checks: access_ok(), might_fault(),
  object size and KASAN.

   - access_ok() had been verified by whoever had set the iov_iter up.
     However, that has happened in a function far away, so proving that
     there's no path to actual copying bypassing those checks is hard
     and proving that iov_iter has not been buggered in the meanwhile is
     also not pleasant. So we want those redone in actual
     copyin/copyout.

   - might_fault() is better off consolidated - we know whether it needs
     to be checked as soon as we enter iov_iter primitive and observe
     the iov_iter flavour. No need to wait until the copyin/copyout. The
     call chains are short enough to make sure we won't miss anything -
     in fact, it's more robust that way, since there are cases where we
     do e.g. forced fault-in before getting to copyin/copyout. It's not
     quite what we need to check (in particular, combination of
     iovec-backed and set_fs(KERNEL_DS) is almost certainly a bug, not a
     cause to skip checks), but that's for later series. For now let's
     keep might_fault().

   - KASAN checks belong in copyin/copyout - at the same level where
     other iov_iter flavours would've hit them in memcpy().

   - object size checks should apply to *all* iov_iter flavours, not
     just iovec-backed ones.

  There are two groups of primitives - one gets the kernel object
  described as pointer + size (copy_to_iter(), etc.) while another gets
  it as page + offset + size (copy_page_to_iter(), etc.)

  For the first group the checks are best done where we actually have a
  chance to find the object size. In other words, those belong in inline
  wrappers in uio.h, before calling into iov_iter.c. Same kind as we
  have for inlined part of copy_to_user().

  For the second group there is no object to look at - offset in page is
  just a number, it bears no type information. So we do them in the
  common helper called by iov_iter.c primitives of that kind. All it
  currently does is checking that we are not trying to access outside of
  the compound page; eventually we might want to add some sanity checks
  on the page involved.

  So the things we need in copyin/copyout part of iov_iter.c do not
  quite match anything in uaccess.h (we want no zeroing, we *do* want
  access_ok() and KASAN and we want no might_fault() or object size
  checks done on that level). OTOH, these needs are simple enough to
  provide a couple of helpers (static in iov_iter.c) doing just what we
  need..."

* 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  iov_iter: saner checks on copyin/copyout
  iov_iter: sanity checks for copy to/from page primitives
  iov_iter/hardening: move object size checks to inlined part
  copy_{to,from}_user(): consolidate object size checks
  copy_{from,to}_user(): move kasan checks and might_fault() out-of-line
2017-07-07 20:39:20 -07:00
Al Viro
aa28de275a iov_iter/hardening: move object size checks to inlined part
There we actually have useful information about object sizes.
Note: this patch has them done for all iov_iter flavours.
Right now we do them twice in iovec case, but that'll change
very shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29 22:21:22 -04:00
Dan Williams
0aed55af88 x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09 09:09:56 -07:00
Al Viro
27c0e3748e [iov_iter] new privimitive: iov_iter_revert()
opposite to iov_iter_advance(); the caller is responsible for never
using it to move back past the initial position.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-02 12:10:47 -04:00
Linus Torvalds
9a19a6db37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
2016-12-16 10:24:44 -08:00
Al Viro
cbbd26b8b1 [iov_iter] new primitives - copy_from_iter_full() and friends
copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users.  *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case.  Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 14:33:36 -05:00
Christoph Hellwig
d38499530e fs: decouple READ and WRITE from the block layer ops
Move READ and WRITE to kernel.h and don't define them in terms of block
layer ops; they are our generic data direction indicators these days
and have no more resemblance with the block layer ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01 09:43:26 -06:00
Linus Torvalds
fed41f7d03 Merge branch 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull splice fixups from Al Viro:
 "A couple of fixups for interaction of pipe-backed iov_iter with
  O_DIRECT reads + constification of a couple of primitives in uio.h
  missed by previous rounds.

  Kudos to davej - his fuzzing has caught those bugs"

* 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [btrfs] fix check_direct_IO() for non-iovec iterators
  constify iov_iter_count() and iter_is_iovec()
  fix ITER_PIPE interaction with direct_IO
2016-10-10 13:38:49 -07:00
Linus Torvalds
abb5a14fa2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted misc bits and pieces.

  There are several single-topic branches left after this (rename2
  series from Miklos, current_time series from Deepa Dinamani, xattr
  series from Andreas, uaccess stuff from from me) and I'd prefer to
  send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
  proc: switch auxv to use of __mem_open()
  hpfs: support FIEMAP
  cifs: get rid of unused arguments of CIFSSMBWrite()
  posix_acl: uapi header split
  posix_acl: xattr representation cleanups
  fs/aio.c: eliminate redundant loads in put_aio_ring_file
  fs/internal.h: add const to ns_dentry_operations declaration
  compat: remove compat_printk()
  fs/buffer.c: make __getblk_slow() static
  proc: unsigned file descriptors
  fs/file: more unsigned file descriptors
  fs: compat: remove redundant check of nr_segs
  cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
  cifs: don't use memcpy() to copy struct iov_iter
  get rid of separate multipage fault-in primitives
  fs: Avoid premature clearing of capabilities
  fs: Give dentry to inode_change_ok() instead of inode
  fuse: Propagate dentry down to inode_change_ok()
  ceph: Propagate dentry down to inode_change_ok()
  xfs: Propagate dentry down to inode_change_ok()
  ...
2016-10-10 13:04:49 -07:00
Al Viro
b57332b410 constify iov_iter_count() and iter_is_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-10 13:57:37 -04:00
Al Viro
241699cd72 new iov_iter flavour: pipe-backed
iov_iter variant for passing data into pipe.  copy_to_iter()
copies data into page(s) it has allocated and stuffs them into
the pipe; copy_page_to_iter() stuffs there a reference to the
page given to it.  Both will try to coalesce if possible.
iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
and friends will do as copy_to_iter() would have and return the
pages where the data would've been copied.  iov_iter_advance()
will truncate everything past the spot it has advanced to.

New primitive: iov_iter_pipe(), used for initializing those.
pipe should be locked all along.

Running out of space acts as fault would for iovec-backed ones;
in other words, giving it to ->read_iter() may result in short
read if the pipe overflows, or -EFAULT if it happens with nothing
copied there.

In other words, ->read_iter() on those acts pretty much like
->splice_read().  Moreover, all generic_file_splice_read() users,
as well as many other ->splice_read() instances can be switched
to that scheme - that'll happen in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-05 18:23:36 -04:00