Add a new registration opcode IORING_REGISTER_CLOCK, which allows the
user to select which clock id it wants to use with CQ waiting timeouts.
It only allows a subset of all posix clocks and currently supports
CLOCK_MONOTONIC and CLOCK_BOOTTIME.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/98f2bc8a3c36cdf8f0e6a275245e81e903459703.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We write at most IORING_OP_LAST entries in the probe buffer, so we don't
need to allocate temporary space for more than that. As a side effect,
we no longer can overflow "size".
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240619020620.5301-3-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_probe checks io_issue_def->not_supported, but we never really set
that field, as we mark non-supported functions through a specific ->prep
handler. This means we end up returning IO_URING_OP_SUPPORTED, even for
disabled operations. Fix it by just checking the prep handler itself.
Fixes: 66f4af93da ("io_uring: add support for probing opcodes")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240619020620.5301-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is pretty nicely abstracted already, but let's move it to a separate
file rather than have it in the main io_uring file. With that, we can
also move the io_ev_fd struct and enum out of global scope.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In some ways, it just "happens to work" currently with using the ops
field for both the free and signaling bit. But it depends on ordering
of operations in terms of freeing and signaling. Clean it up and use the
usual refs == 0 under RCU read side lock to determine if the ev_fd is
still valid, and use the reference to gate the freeing as well.
Fixes: 21a091b970 ("io_uring: signal registered eventfd to process deferred task work")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The io_register_iowq_max_workers() function calls io_put_sq_data(),
which acquires the sqd->lock without releasing the uring_lock.
Similar to the commit 009ad9f0c6 ("io_uring: drop ctx->uring_lock
before acquiring sqd->lock"), this can lead to a potential deadlock
situation.
To resolve this issue, the uring_lock is released before calling
io_put_sq_data(), and then it is re-acquired after the function call.
This change ensures that the locks are acquired in the correct
order, preventing the possibility of a deadlock.
Suggested-by: Maximilian Heyne <mheyne@amazon.de>
Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are a few of those:
io_uring/fdinfo.c:170:16: warning: declaration shadows a local variable [-Wshadow]
170 | struct file *f = io_file_from_index(&ctx->file_table, i);
| ^
io_uring/fdinfo.c:53:67: note: previous declaration is here
53 | __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
| ^
io_uring/cancel.c:187:25: warning: declaration shadows a local variable [-Wshadow]
187 | struct io_uring_task *tctx = node->task->io_uring;
| ^
io_uring/cancel.c:166:31: note: previous declaration is here
166 | struct io_uring_task *tctx,
| ^
io_uring/register.c:371:25: warning: declaration shadows a local variable [-Wshadow]
371 | struct io_uring_task *tctx = node->task->io_uring;
| ^
io_uring/register.c:312:24: note: previous declaration is here
312 | struct io_uring_task *tctx = NULL;
| ^
and a simple cleanup gets rid of them. For the fdinfo case, make a
distinction between the file being passed in (for the ring), and the
registered files we iterate. For the other two cases, just get rid of
shadowed variable, there's no reason to have a new one.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This adds an api to register and unregister the napi for io-uring. If
the arg value is specified when unregistering, the current napi setting
for the busy poll timeout is copied into the user structure. If this is
not required, NULL can be passed as the arg value.
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230608163839.2891748-7-shr@devkernel.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add compat.h include to avoid a potential build issue:
io_uring/register.c:281:6: error: call to undeclared function 'in_compat_syscall'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
if (in_compat_syscall()) {
^
1 warning generated.
io_uring/register.c:282:9: error: call to undeclared function 'compat_get_bitmap'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
ret = compat_get_bitmap(cpumask_bits(new_mask),
^
Fixes: c43203154d ("io_uring/register: move io_uring_register(2) related code to register.c")
Reported-by: Manu Bretelle <chantra@meta.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The tail of the provided ring buffer is shared between the kernel and
the application, but the head is private to the kernel as the
application doesn't need to see it. However, this also prevents the
application from knowing how many buffers the kernel has consumed.
Usually this is fine, as the information is inherently racy in that
the kernel could be consuming buffers continually, but for cleanup
purposes it may be relevant to know how many buffers are still left
in the ring.
Add IORING_REGISTER_PBUF_STATUS which will return status for a given
provided buffer ring. Right now it just returns the head, but space
is reserved for more information later in, if needed.
Link: https://github.com/axboe/liburing/discussions/1020
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Most of this code is basically self contained, move it out of the core
io_uring file to bring a bit more separation to the registration related
bits. This moves another ~10% of the code into register.c.
Signed-off-by: Jens Axboe <axboe@kernel.dk>