Commit graph

1462 commits

Author SHA1 Message Date
Jens Axboe
9ce85ef2cb io_uring: remove dead non-zero 'poll' check
Colin reports that Coverity complains about checking for poll being
non-zero after having dereferenced it multiple times. This is a valid
complaint, and actually a leftover from back when this code was based
on the aio poll code.

Kill the redundant check.

Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-09 08:20:28 -06:00
Pavel Begunkov
8f487ef2cb io_uring: mitigate unlikely iopoll lag
We have requests like IORING_OP_FILES_UPDATE that don't go through
->iopoll_list but get completed in place under ->uring_lock, and so
after dropping the lock io_iopoll_check() should expect that some CQEs
might have get completed in a meanwhile.

Currently such events won't be accounted in @nr_events, and the loop
will continue to poll even if there is enough of CQEs. It shouldn't be a
problem as it's not likely to happen and so, but not nice either. Just
return earlier in this case, it should be enough.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-08 14:07:43 -06:00
Pavel Begunkov
c32aace0cf io_uring: fix drain alloc fail return code
After a recent change io_drain_req() started to fail requests with
result=0 in case of allocation failure, where it should be and have
been -ENOMEM.

Fixes: 76cc33d791 ("io_uring: refactor io_req_defer()")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e068110ac4293e0c56cfc4d280d0f22b9303ec08.1625682153.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-07 12:49:32 -06:00
Pavel Begunkov
e09ee51060 io_uring: fix exiting io_req_task_work_add leaks
If one entered io_req_task_work_add() not seeing PF_EXITING, it will set
a ->task_state bit and try task_work_add(), which may fail by that
moment. If that happens the function would try to cancel the request.

However, in a meanwhile there might come other io_req_task_work_add()
callers, which will see the bit set and leave their requests in the
list, which will never be executed.

Don't propagate an error, but clear the bit first and then fallback
all requests that we can splice from the list. The callback functions
have to be able to deal with PF_EXITING, so poll and apoll was modified
via changing io_poll_rewait().

Fixes: 7cbf1722d5 ("io_uring: provide FIFO ordering for task_work")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/060002f19f1fdbd130ba24aef818ea4d3080819b.1625142209.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-01 13:40:32 -06:00
Pavel Begunkov
5b0a6acc73 io_uring: simplify task_work func
Since we don't really use req->task_work anymore, get rid of it together
with the nasty ->func aliasing between ->io_task_work and ->task_work,
and hide ->fallback_node inside of io_task_work.

Also, as task_work is gone now, replace the callback type from
task_work_func_t to a function taking io_kiocb to avoid casting and
simplify code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-01 13:40:23 -06:00
Pavel Begunkov
9011bf9a13 io_uring: fix stuck fallback reqs
When task_work_add() fails, we use ->exit_task_work to queue the work.
That will be run only in the cancellation path, which happens either
when the ctx is dying or one of tasks with inflight requests is exiting
or executing. There is a good chance that such a request would just get
stuck in the list potentially hodling a file, all io_uring rsrc
recycling or some other resources. Nothing terrible, it'll go away at
some point, but we don't want to lock them up for longer than needed.

Replace that hand made ->exit_task_work with delayed_work + llist
inspired by fput_many().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-01 13:40:17 -06:00
Hao Xu
e149bd742b io_uring: code clean for kiocb_done()
A simple code clean for kiocb_done()

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:40 -06:00
Hao Xu
915b3dde9b io_uring: spin in iopoll() only when reqs are in a single queue
We currently spin in iopoll() when requests to be iopolled are for
same file(device), while one device may have multiple hardware queues.
given an example:

hw_queue_0     |    hw_queue_1
req(30us)           req(10us)

If we first spin on iopolling for the hw_queue_0. the avg latency would
be (30us + 30us) / 2 = 30us. While if we do round robin, the avg
latency would be (30us + 10us) / 2 = 20us since we reap the request in
hw_queue_1 in time. So it's better to do spinning only when requests
are in same hardware queue.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:40 -06:00
Pavel Begunkov
99ebe4efbd io_uring: pre-initialise some of req fields
Most of requests are allocated from an internal cache, so it's waste of
time fully initialising them every time. Instead, let's pre-init some of
the fields we can during initial allocation (e.g. kmalloc(), see
io_alloc_req()) and keep them valid on request recycling. There are four
of them in this patch:

->ctx is always stays the same
->link is NULL on free, it's an invariant
->result is not even needed to init, just a precaution
->async_data we now clean in io_dismantle_req() as it's likely to
   never be allocated.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/892ba0e71309bba9fe9e0142472330bbf9d8f05d.1624739600.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:40 -06:00
Pavel Begunkov
5182ed2e33 io_uring: refactor io_submit_flush_completions
Don't init req_batch before we actually need it. Also, add a small clean
up for req declaration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ad85512e12bd3a20d521e9782750300970e5afc8.1624739600.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:40 -06:00
Pavel Begunkov
4cfb25bf88 io_uring: optimise hot path restricted checks
Move likely/unlikely from io_check_restriction() to specifically
ctx->restricted check, because doesn't do what it supposed to and make
the common path take an extra jump.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/22bf70d0a543dfc935d7276bdc73081784e30698.1624739600.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:40 -06:00
Pavel Begunkov
e5dc480d4e io_uring: remove not needed PF_EXITING check
Since cancellation got moved before exit_signals(), there is no one left
who can call io_run_task_work() with PF_EXIING set, so remove the check.
Note that __io_req_task_submit() still needs a similar check.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f7f305ececb1e6044ea649fb983ca754805bb884.1624739600.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:40 -06:00
Pavel Begunkov
dd432ea520 io_uring: mainstream sqpoll task_work running
task_works are widely used, so place io_run_task_work() directly into
the main path of io_sq_thread(), and remove it from other places where
it's not needed anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/24eb5e35d519c590d3dffbd694b4c61a5fe49029.1624739600.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Pavel Begunkov
b2d9c3da77 io_uring: refactor io_arm_poll_handler()
gcc 11 goes a weird path and duplicates most of io_arm_poll_handler()
for READ and WRITE cases. Help it and move all pollin vs pollout
specific bits under a single if-else, so there is no temptation for this
kind of unfolding.

before vs after:
   text    data     bss     dec     hex filename
  85362   12650       8   98020   17ee4 ./fs/io_uring.o
  85186   12650       8   97844   17e34 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1deea0037293a922a0358e2958384b2e42437885.1624739600.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Olivier Langlois
59b735aeeb io_uring: reduce latency by reissueing the operation
It is quite frequent that when an operation fails and returns EAGAIN,
the data becomes available between that failure and the call to
vfs_poll() done by io_arm_poll_handler().

Detecting the situation and reissuing the operation is much faster
than going ahead and push the operation to the io-wq.

Performance improvement testing has been performed with:
Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.

4 measurements have been taken:
1. The time it takes to process a read request when data is already available
2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
3. The time it takes to execute io_queue_async_work()
4. The time it takes to complete a read request asynchronously

2.25% of all the read operations did use the new path.

ready data (baseline)
avg	3657.94182918628
min	580
max	20098
stddev	1213.15975908162

reissue	completion
average	7882.67567567568
min	2316
max	28811
stddev	1982.79172973284

insert io-wq time
average	8983.82276995305
min	3324
max	87816
stddev	2551.60056552038

async time completion
average	24670.4758861127
min	10758
max	102612
stddev	3483.92416873804

Conclusion:
On average reissuing the sqe with the patch code is 1.1uSec faster and
in the worse case scenario 59uSec faster than placing the request on
io-wq

On average completion time by reissuing the sqe with the patch code is
16.79uSec faster and in the worse case scenario 73.8uSec faster than
async completion.

Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e8441419bb1b8f3c3fcc607b2713efecdef2136.1624364038.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Jens Axboe
22634bc562 io_uring: add IOPOLL and reserved field checks to IORING_OP_UNLINKAT
We can't support IOPOLL with non-pollable request types, and we should
check for unused/reserved fields like we do for other request types.

Fixes: 14a1143b68 ("io_uring: add support for IORING_OP_UNLINKAT")
Cc: stable@vger.kernel.org
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Jens Axboe
ed7eb25922 io_uring: add IOPOLL and reserved field checks to IORING_OP_RENAMEAT
We can't support IOPOLL with non-pollable request types, and we should
check for unused/reserved fields like we do for other request types.

Fixes: 80a261fd00 ("io_uring: add support for IORING_OP_RENAMEAT")
Cc: stable@vger.kernel.org
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Pavel Begunkov
12dcb58ac7 io_uring: refactor io_openat2()
Put do_filp_open() fail path of io_openat2() under a single if,
deduplicating put_unused_fd(), making it look better and helping
the hot path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f4c84d25c049d0af2adc19c703bbfef607200209.1624543113.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Pavel Begunkov
16340eab61 io_uring: update sqe layout build checks
Add missing BUILD_BUG_SQE_ELEM() for ->buf_group verifying that SQE
layout doesn't change.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1f9d21bd74599b856b3a632be4c23ffa184a3ef0.1624543113.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Pavel Begunkov
fe7e325750 io_uring: fix code style problems
Fix a bunch of problems mostly found by checkpatch.pl

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cfaf9a2f27b43934144fe9422a916bd327099f44.1624543113.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Pavel Begunkov
1a924a8082 io_uring: refactor io_sq_thread()
Move needs_sched declaration into the block where it's used, so it's
harder to misuse/wrongfully reuse.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e4a07db1353ee38b924dd1b45394cf8e746130b4.1624543113.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:39 -06:00
Pavel Begunkov
948e19479c io_uring: don't change sqpoll creds if not needed
SQPOLL doesn't need to change creds if it's not submitting requests.
Move creds overriding into __io_sq_thread() after checking if there are
SQEs pending.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c54368da2357ac539e0a333f7cfff70d5fb045b2.1624543113.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 14:15:38 -06:00
Olivier Langlois
4ce8ad95f0 io_uring: Create define to modify a SQPOLL parameter
The magic number used to cap the number of entries extracted from an
io_uring instance SQ before moving to the other instances is an
interesting parameter to experiment with.

A define has been created to make it easy to change its value from a
single location.

Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b401640063e77ad3e9f921e09c9b3ac10a8bb923.1624473200.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-23 20:38:21 -06:00
Olivier Langlois
9971350177 io_uring: Fix race condition when sqp thread goes to sleep
If an asynchronous completion happens before the task is preparing
itself to wait and set its state to TASK_INTERRUPTIBLE, the completion
will not wake up the sqp thread.

Cc: stable@vger.kernel.org
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d1419dc32ec6a97b453bee34dc03fa6a02797142.1624473200.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-23 20:38:21 -06:00
Pavel Begunkov
7a778f9dc3 io_uring: improve in tctx_task_work() resubmission
If task_state is cleared, io_req_task_work_add() will go the slow path
adding a task_work, setting the task_state, waking up the task and so
on. Not to mention it's expensive. tctx_task_work() first clears the
state and then executes all the work items queued, so if any of them
resubmits or adds new task_work items, it would unnecessarily go through
the slow path of io_req_task_work_add().

Let's clear the ->task_state at the end. We still have to check
->task_list for emptiness afterward to synchronise with
io_req_task_work_add(), do that, and set the state back if we're going
to retry, because clearing not-ours task_state on the next iteration
would be buggy.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ef72cdac7022adf0cd7ce4bfe3bb5c82a62eb93.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
16f7207038 io_uring: don't resched with empty task_list
Entering tctx_task_work() with empty task_list is a strange scenario,
that can happen only on rare occasion during task exit, so let's not
check for task_list emptiness in advance and do it do-while style. The
code still correct for the empty case, just would do extra work about
which we don't care.

Do extra step and do the check before cond_resched(), so we don't
resched if have nothing to execute.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c4173e288e69793d03c7d7ce826f9d28afba718a.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
c6538be9e4 io_uring: refactor tctx task_work list splicing
We don't need a full copy of tctx->task_list in tctx_task_work(), but
only a first one, so just assign node directly.

Taking into account that task_works are run in a context of a task,
it's very unlikely to first see non-empty tctx->task_list and then
splice it empty, can only happen with task_work cancellations that is
not-normal slow path anyway. Hence, get rid of the check in the end,
it's there not for validity but "performance" purposes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d076c83fedb8253baf43acb23b8fafd7c5da1714.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
ebd0df2e63 io_uring: optimise task_work submit flushing
tctx_task_work() tries to fetch a next batch of requests, but before it
would flush completions from the previous batch that may be sub-optimal.
E.g. io_req_task_queue() executes a head of the link where all the
linked may be enqueued through the same io_req_task_queue(). And there
are more cases for that.

Do the flushing at the end, so it can cache completions of several waves
of a single tctx_task_work(), and do the flush at the very end.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cac83934e4fbce520ff8025c3524398b3ae0270.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
3f18407dc6 io_uring: inline __tctx_task_work()
Inline __tctx_task_work() into tctx_task_work() in preparation for
further optimisations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f9c05c4bc9763af7bd8e25ebc3c5f7b6f69148f8.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
a3dbdf54da io_uring: refactor io_get_sequence()
Clean up io_get_sequence() and add a comment describing the magic around
sequence correction.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f55dc409936b8afa4698d24b8677a34d31077ccb.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
c854357bc1 io_uring: clean all flags in io_clean_op() at once
Clean all flags in io_clean_op() in the end in one operation, will save
us a couple of operation and binary size.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8efe1f022a037f74e7fe497c69fb554d59bfeaf.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
1dacb4df4e io_uring: simplify iovec freeing in io_clean_op()
We don't get REQ_F_NEED_CLEANUP for rw unless there is ->free_iovec set,
so remove the optimisation of NULL checking it inline, kfree() will take
care if that would ever be the case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a233dc655d3d45bd4f69b73d55a61de46d914415.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
b8e64b5300 io_uring: track request creds with a flag
Currently, if req->creds is not NULL, then there are creds assigned.
Track the invariant with a new flag in req->flags. No need to clear the
field at init, and also cleanup can be efficiently moved into
io_clean_op().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5f8baeb8d3b909487f555542350e2eac97005556.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
c10d1f986b io_uring: move creds from io-wq work to io_kiocb
io-wq now doesn't have anything to do with creds now, so move ->creds
from struct io_wq_work into request (aka struct io_kiocb).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8520c72ab8b8f4b96db12a228a2ab4c094ae64e1.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Pavel Begunkov
2a2758f26d io_uring: refactor io_submit_flush_completions()
struct io_comp_state is always contained in struct io_ring_ctx, don't
pass them into io_submit_flush_completions() separately, it makes the
interface cleaner and simplifies it for the compiler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/44d6ca57003a82484338e95197024dbd65a1b376.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18 09:22:02 -06:00
Jens Axboe
fe76421d1d io_uring: allow user configurable IO thread CPU affinity
io-wq defaults to per-node masks for IO workers. This works fine by
default, but isn't particularly handy for workloads that prefer more
specific affinities, for either performance or isolation reasons.

This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU
mask that is then applied to IO thread workers, and an
IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the
default of per-node.

Note that no care is given to existing IO threads, they will need to go
through a reschedule before the affinity is correct if they are already
running or sleeping.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-17 10:25:50 -06:00
Olivier Langlois
236daeae36 io_uring: Add to traces the req pointer when available
The req pointer uniquely identify a specific request.
Having it in traces can provide valuable insights that is not possible
to have if the calling process is reusing the same user_data value.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16 06:41:46 -06:00
Pavel Begunkov
2335f6f5dd io_uring: optimise io_commit_cqring()
In most cases io_commit_cqring() is just an smp_store_release(), and
it's hot enough, especially for IRQ rw, to want it to save on a function
call. Mark it inline and extract a non-inlined slow path doing drain
and timeout flushing. The inlined part is pretty slim to not cause
binary bloating.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7350f8b6b92caa50a48a80be39909f0d83eddd93.1623772051.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:44:34 -06:00
Pavel Begunkov
3c19966d37 io_uring: shove more drain bits out of hot path
Place all drain_next logic into io_drain_req(), so it's never executed
if there was no drained requests before. The only thing we need is to
set ->drain_active if we see a request with IOSQE_IO_DRAIN, do that in
io_init_req() where flags are definitely in registers.

Also, all drain-related code is encapsulated in io_drain_req(), makes it
cleaner.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68bf4f7395ddaafbf1a26bd97b57d57d45a9f900.1623772051.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:44:34 -06:00
Pavel Begunkov
10c669040e io_uring: switch !DRAIN fast path when possible
->drain_used is one way, which is not optimal if users use DRAIN but
very rarely. However, we can just clear it in io_drain_req() when all
drained before requests are gone. Also rename the flag to reflect the
change and be more clear about it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7f37a240857546a94df6348507edddacab150460.1623772051.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:44:33 -06:00
Pavel Begunkov
27f6b318de io_uring: fix min types mismatch in table alloc
fs/io_uring.c: In function 'io_alloc_page_table':
include/linux/minmax.h:20:28: warning: comparison of distinct pointer
	types lacks a cast

Cast everything to size_t using min_t.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 9123c8ffce ("io_uring: add helpers for 2 level table alloc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/50f420a956bca070a43810d4a805293ed54f39d8.1623759527.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:40:17 -06:00
Fam Zheng
dd9ae8a0b2 io_uring: Fix comment of io_get_sqe
The sqe_ptr argument has been gone since 709b302fad (io_uring:
simplify io_get_sqring, 2020-04-08), made the return value of the
function. Update the comment accordingly.

Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210604164256.12242-1-fam.zheng@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:39:16 -06:00
Pavel Begunkov
441b8a7803 io_uring: optimise non-drain path
Replace drain checks with one-way flag set upon seeing the first
IOSQE_IO_DRAIN request. There are several places where it cuts cycles
well:

1) It's much faster than the fast check with two
conditions in io_drain_req() including pretty complex
list_empty_careful().

2) We can mark io_queue_sqe() inline now, that's a huge win.

3) It replaces timeout and drain checks in io_commit_cqring() with a
single flags test. Also great not touching ->defer_list there without a
reason so limiting cache bouncing.

It adds a small amount of overhead to drain path, but it's negligible.
The main nuisance is that once it meets any DRAIN request in io_uring
instance lifetime it will _always_ go through a slower path, so
drain-less and offset-mode timeout less applications are preferable.
The overhead in that case would be not big, but it's worth to bear in
mind.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/98d2fff8c4da5144bb0d08499f591d4768128ea3.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
76cc33d791 io_uring: refactor io_req_defer()
Rename io_req_defer() into io_drain_req() and refactor it uncoupling it
from io_queue_sqe() error handling and preparing for coming
optimisations. Also, prioritise non IOSQE_ASYNC path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4f17dd56e7fbe52d1866f8acd8efe3284d2bebcb.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
0499e582aa io_uring: move uring_lock location
->uring_lock is prevalently used for submission, even though it protects
many other things like iopoll, registeration, selected bufs, and more.
And it's placed together with ->cq_wait poked on completion and CQ
waiting sides. Move them apart, ->uring_lock goes to the submission
data, and cq_wait to completion related chunk. The last one requires
some reshuffling so everything needed by io_cqring_ev_posted*() is in
one cacheline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dea5e845caee4c98aa0922b46d713154d81f7bd8.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
311997b3fc io_uring: wait heads renaming
We use several wait_queue_head's for different purposes, but namings are
confusing. First rename ctx->cq_wait into ctx->poll_wait, because this
one is used for polling an io_uring instance. Then rename ctx->wait into
ctx->cq_wait, which is responsible for CQE waiting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/47b97a097780c86c67b20b6ccc4e077523dce682.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
5ed7a37d21 io_uring: clean up check_overflow flag
There are no users of ->sq_check_overflow, only ->cq_check_overflow is
used. Combine it and move out of completion related part of struct
io_ring_ctx.

A not so obvious benefit of it is fitting all completion side fields
into a single cacheline. It was taking 2 lines before with 56B padding,
and io_cqring_ev_posted*() were still touching both of them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25927394964df31d113e3c729416af573afff5f5.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
5e159204d7 io_uring: small io_submit_sqe() optimisation
submit_state.link is used only to assemble a link and not used for
actual submission, so clear it before io_queue_sqe() in io_submit_sqe(),
awhile it's hot and in caches and queueing doesn't spoil it. May also
potentially help compiler with spilling or to do other optimisations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1579939426f3ad6b55af3005b1389bbbed7d780d.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
f18ee4cf0a io_uring: optimise completion timeout flushing
io_commit_cqring() might be very hot and we definitely don't want to
touch ->timeout_list there, because 1) it's shared with the submission
side so might lead to cache bouncing and 2) may need to load an extra
cache line, especially for IRQ completions.

We're interested in it at the completion side only when there are
offset-mode timeouts, which are not so popular. Replace
list_empty(->timeout_list) hot path check with a new one-way flag, which
is set when we prepare the first offset-mode timeout.

note: the flag sits in the same line as briefly used after ->rings

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e4892ec68b71a69f92ffbea4a1499be3ec0d463b.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00
Pavel Begunkov
15641e4270 io_uring: don't cache number of dropped SQEs
Kill ->cached_sq_dropped and wire DRAIN sequence number correction via
->cq_extra, which is there exactly for that purpose. User visible
dropped counter will be populated by incrementing it instead of keeping
a copy, similarly as it was done not so long ago with cq_overflow.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/088aceb2707a534d531e2770267c4498e0507cc1.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-15 15:38:40 -06:00