linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-28 21:33:52 +00:00

Author	SHA1	Message	Date
Jens Axboe	b5311dbc2c	io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler This flag should not be persistent across retries, so ensure we clear it before potentially attemting a retry. Fixes: `c3f9109dbc` ("io_uring/kbuf: flag request if buffer pool is empty after buffer pick") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-07 13:22:05 -07:00
Pavel Begunkov	1a8ec63b2b	io_uring: fix io_queue_proc modifying req->flags With multiple poll entries __io_queue_proc() might be running in parallel with poll handlers and possibly task_work, we should not be carelessly modifying req->flags there. io_poll_double_prepare() handles a similar case with locking but it's much easier to move it into __io_arm_poll_handler(). Cc: stable@vger.kernel.org Fixes: `595e52284d` ("io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/455cc49e38cf32026fa1b49670be8c162c2cb583.1709834755.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-07 11:10:28 -07:00
Pavel Begunkov	70581dcd06	io_uring: fix mshot read defer taskrun cqe posting We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions are handled but aux should be explicitly disallowed by opcode handlers. Cc: stable@vger.kernel.org Fixes: `fc68fcda04` ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6fb7cba6f5366da25f4d3eb95273f062309d97fa.1709740837.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-07 06:30:33 -07:00
Dan Carpenter	8ede3db506	io_uring/net: fix overflow check in io_recvmsg_mshot_prep() The "controllen" variable is type size_t (unsigned long). Casting it to int could lead to an integer underflow. The check_add_overflow() function considers the type of the destination which is type int. If we add two positive values and the result cannot fit in an integer then that's counted as an overflow. However, if we cast "controllen" to an int and it turns negative, then negative values can fit into an int type so there is no overflow. Good: 100 + (unsigned long)-4 = 96 <-- overflow Bad: 100 + (int)-4 = 96 <-- no overflow I deleted the cast of the sizeof() as well. That's not a bug but the cast is unnecessary. Fixes: `9b0fc3c054` ("io_uring: fix types in io_recvmsg_multishot_overflow") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-04 16:33:15 -07:00
Muhammad Usama Anjum	86bcacc957	io_uring/net: correct the type of variable The namelen is of type int. It shouldn't be made size_t which is unsigned. The signed number is needed for error checking before use. Fixes: `c55978024d` ("io_uring/net: move receive multishot out of the generic msghdr path") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240301144349.2807544-1-usama.anjum@collabora.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-04 16:33:11 -07:00
Xiaobing Li	3fcb9d1720	io_uring/sqpoll: statistics of the true utilization of sq threads Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo. Variable description: "work_time" in the code represents the sum of the jiffies of the sq thread actually processing IO, that is, how many milliseconds it actually takes to process IO. "total_time" represents the total time that the sq thread has elapsed from the beginning of the loop to the current time point, that is, how many milliseconds it has spent in total. The test tool is fio, and its parameters are as follows: [global] ioengine=io_uring direct=1 group_reporting bs=128k norandommap=1 randrepeat=0 refill_buffers ramp_time=30s time_based runtime=1m clocksource=clock_gettime overwrite=1 log_avg_msec=1000 numjobs=1 [disk0] filename=/dev/nvme0n1 rw=read iodepth=16 hipri sqthread_poll=1 The test results are as follows: Every 2.0s: cat /proc/9230/fdinfo/6 \| grep -E Sq SqMask: 0x3 SqHead: 3197153 SqTail: 3197153 CachedSqHead: 3197153 SqThread: 9231 SqThreadCpu: 11 SqTotalTime: 18099614 SqWorkTime: 16748316 The test results corresponding to different iodepths are as follows: \|-----------\|-------\|-------\|-------\|------\|-------\| \| iodepth \| 1 \| 4 \| 8 \| 16 \| 64 \| \|-----------\|-------\|-------\|-------\|------\|-------\| \|utilization\| 2.9% \| 8.8% \| 10.9% \| 92.9%\| 84.4% \| \|-----------\|-------\|-------\|-------\|------\|-------\| \| idle \| 97.1% \| 91.2% \| 89.1% \| 7.1% \| 15.6% \| \|-----------\|-------\|-------\|-------\|------\|-------\| Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com> Link: https://lore.kernel.org/r/20240228091251.543383-1-xiaobing.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-01 06:28:19 -07:00
Jens Axboe	eb18c29dd2	io_uring/net: move recv/recvmsg flags out of retry loop The flags don't change, just intialize them once rather than every loop for multishot. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-03-01 06:28:19 -07:00
Jens Axboe	c3f9109dbc	io_uring/kbuf: flag request if buffer pool is empty after buffer pick Normally we do an extra roundtrip for retries even if the buffer pool has depleted, as we don't check that upfront. Rather than add this check, have the buffer selection methods mark the request with REQ_F_BL_EMPTY if the used buffer group is out of buffers after this selection. This is very cheap to do once we're all the way inside there anyway, and it gives the caller a chance to make better decisions on how to proceed. For example, recv/recvmsg multishot could check this flag when it decides whether to keep receiving or not. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-27 11:52:45 -07:00
Jens Axboe	792060de8b	io_uring/net: improve the usercopy for sendmsg/recvmsg We're spending a considerable amount of the sendmsg/recvmsg time just copying in the message header. And for provided buffers, the known single entry iovec. Be a bit smarter about it and enable/disable user access around our copying. In a test case that does both sendmsg and recvmsg, the runtime before this change (averaged over multiple runs, very stable times however): Kernel Time Diff ==================================== -git 4720 usec -git+commit 4311 usec -8.7% and looking at a profile diff, we see the following: 0.25% +9.33% [kernel.kallsyms] [k] _copy_from_user 4.47% -3.32% [kernel.kallsyms] [k] __io_msg_copy_hdr.constprop.0 where we drop more than 9% of _copy_from_user() time, and consequently add time to __io_msg_copy_hdr() where the copies are now attributed to, but with a net win of 6%. In comparison, the same test case with send/recv runs in 3745 usec, which is (expectedly) still quite a bit faster. But at least sendmsg/recvmsg is now only ~13% slower, where it was ~21% slower before. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-27 11:16:00 -07:00
Jens Axboe	c55978024d	io_uring/net: move receive multishot out of the generic msghdr path Move the actual user_msghdr / compat_msghdr into the send and receive sides, respectively, so we can move the uaddr receive handling into its own handler, and ditto the multishot with buffer selection logic. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-27 11:09:20 -07:00
Jens Axboe	52307ac4f2	io_uring/net: unify how recvmsg and sendmsg copy in the msghdr For recvmsg, we roll our own since we support buffer selections. This isn't the case for sendmsg right now, but in preparation for doing so, make the recvmsg copy helpers generic so we can call them from the sendmsg side as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-27 09:56:50 -07:00
Jens Axboe	b4ccc4dd13	io_uring/napi: enable even with a timeout of 0 1 usec is not as short as it used to be, and it makes sense to allow 0 for a busy poll timeout - this means just do one loop to check if we have anything available. Add a separate ->napi_enabled to check if napi has been enabled or not. While at it, move the writing of the ctx napi values after we've copied the old values back to userspace. This ensures that if the call fails, we'll be in the same state as we were before, rather than some indeterminate state. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-15 15:37:28 -07:00
Jens Axboe	871760eb7a	io_uring: kill stale comment for io_cqring_overflow_kill() This function now deals only with discarding overflow entries on ring free and exit, and it no longer returns whether we successfully flushed all entries as there's no CQE posting involved anymore. Kill the outdated comment. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-15 14:04:56 -07:00
Jens Axboe	c8d8fc3b2d	io_uring/sqpoll: use the correct check for pending task_work A previous commit moved to using just the private task_work list for SQPOLL, but it neglected to update the check for whether we have pending task_work. Normally this is fine as we'll attempt to run it unconditionally, but if we race with going to sleep AND task_work being added, then we certainly need the right check here. Fixes: `af5d68f889` ("io_uring/sqpoll: manage task_work privately") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-14 13:57:20 -07:00
Jens Axboe	78f9b61bd8	io_uring: wake SQPOLL task when task_work is added to an empty queue If there's no current work on the list, we still need to potentially wake the SQPOLL task if it is sleeping. This is ordered with the wait queue addition in sqpoll, which adds to the wait queue before checking for pending work items. Fixes: `af5d68f889` ("io_uring/sqpoll: manage task_work privately") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-14 13:56:08 -07:00
Jens Axboe	428f138268	io_uring/napi: ensure napi polling is aborted when work is available While testing io_uring NAPI with DEFER_TASKRUN, I ran into slowdowns and stalls in packet delivery. Turns out that while io_napi_busy_loop_should_end() aborts appropriately on regular task_work, it does not abort if we have local task_work pending. Move io_has_work() into the private io_uring.h header, and gate whether we should continue polling on that as well. This makes NAPI polling on send/receive work as designed with IORING_SETUP_DEFER_TASKRUN as well. Fixes: `8d0c12a80c` ("io-uring: add napi busy poll support") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-14 13:01:25 -07:00
Kuniyuki Iwashima	3fb1764c6b	io_uring: Don't include af_unix.h. Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does not use AF_UNIX anymore. Let's not include af_unix.h and instead include necessary headers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240212234236.63714-1-kuniyu@amazon.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-12 19:02:11 -07:00
Stefan Roesch	ef1186c1a8	io_uring: add register/unregister napi function This adds an api to register and unregister the napi for io-uring. If the arg value is specified when unregistering, the current napi setting for the busy poll timeout is copied into the user structure. If this is not required, NULL can be passed as the arg value. Signed-off-by: Stefan Roesch <shr@devkernel.io> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230608163839.2891748-7-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-09 11:54:32 -07:00
Stefan Roesch	ff183d427d	io-uring: add sqpoll support for napi busy poll This adds the sqpoll support to the io-uring napi. Signed-off-by: Stefan Roesch <shr@devkernel.io> Suggested-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/20230608163839.2891748-6-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-09 11:54:28 -07:00
Stefan Roesch	8d0c12a80c	io-uring: add napi busy poll support This adds the napi busy polling support in io_uring.c. It adds a new napi_list to the io_ring_ctx structure. This list contains the list of napi_id's that are currently enabled for busy polling. The list is synchronized by the new napi_lock spin lock. The current default napi busy polling time is stored in napi_busy_poll_to. If napi busy polling is not enabled, the value is 0. In addition there is also a hash table. The hash table store the napi id and the pointer to the above list nodes. The hash table is used to speed up the lookup to the list elements. The hash table is synchronized with rcu. The NAPI_TIMEOUT is stored as a timeout to make sure that the time a napi entry is stored in the napi list is limited. The busy poll timeout is also stored as part of the io_wait_queue. This is necessary as for sq polling the poll interval needs to be adjusted and the napi callback allows only to pass in one value. This has been tested with two simple programs from the liburing library repository: the napi client and the napi server program. The client sends a request, which has a timestamp in its payload and the server replies with the same payload. The client calculates the roundtrip time and stores it to calculate the results. The client is running on host1 and the server is running on host 2 (in the same rack). The measured times below are roundtrip times. They are average times over 5 runs each. Each run measures 1 million roundtrips. no rx coal rx coal: frames=88,usecs=33 Default 57us 56us client_poll=100us 47us 46us server_poll=100us 51us 46us client_poll=100us+ 40us 40us server_poll=100us client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on server client_poll=100us+ 41us 39us server_poll=100us+ prefer napi busy poll on client + server Signed-off-by: Stefan Roesch <shr@devkernel.io> Suggested-by: Olivier Langlois <olivier@trillion01.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230608163839.2891748-5-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-09 11:54:19 -07:00
Stefan Roesch	405b4dc14b	io-uring: move io_wait_queue definition to header file This moves the definition of the io_wait_queue structure to the header file so it can be also used from other files. Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20230608163839.2891748-4-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-09 11:54:12 -07:00
Jens Axboe	adaad27980	Merge branch 'for-io_uring-add-napi-busy-polling-support' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into for-6.9/io_uring Pull netdev side of the io_uring napi support. * 'for-io_uring-add-napi-busy-polling-support' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux: net: add napi_busy_loop_rcu() net: split off __napi_busy_poll from napi_busy_poll	2024-02-09 11:53:39 -07:00
Stefan Roesch	b4e8ae5c8c	net: add napi_busy_loop_rcu() This adds the napi_busy_loop_rcu() function. This function assumes that the calling function is already holding the rcu read lock and napi_busy_loop() does not need to take the rcu read lock. Add a NAPI_F_NO_SCHED flag, which tells __napi_busy_loop() to abort if we need to reschedule rather than drop the RCU read lock and reschedule. Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20230608163839.2891748-3-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-09 10:01:09 -08:00
Stefan Roesch	13d381b440	net: split off __napi_busy_poll from napi_busy_poll This splits off the key part of the napi_busy_poll function into its own function, __napi_busy_poll, and changes the prefer_busy_poll bool to be flag based to allow passing in more flags in the future. This is done in preparation for an additional napi_busy_poll() function, that doesn't take the rcu_read_lock(). The new function is introduced in the next patch. Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20230608163839.2891748-2-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-09 10:01:09 -08:00
Tony Solomonik	b4bb1900c1	io_uring: add support for ftruncate Adds support for doing truncate through io_uring, eliminating the need for applications to roll their own thread pool or offload mechanism to be able to do non-blocking truncates. Signed-off-by: Tony Solomonik <tony.solomonik@gmail.com> Link: https://lore.kernel.org/r/20240202121724.17461-3-tony.solomonik@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-09 09:04:39 -07:00
Tony Solomonik	5f0d594c60	Add do_ftruncate that truncates a struct file do_sys_ftruncate receives a file descriptor, fgets the struct file, and finally actually truncates the file. do_ftruncate allows for passing in a file directly, with the caller already holding a reference to it. Signed-off-by: Tony Solomonik <tony.solomonik@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240202121724.17461-2-tony.solomonik@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-09 08:57:56 -07:00
Kunwu Chan	a6e959bd3d	io_uring: Simplify the allocation of slab caches commit `0a31bd5f2b` ("KMEM_CACHE(): simplify slab cache creation") introduces a new macro. Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Link: https://lore.kernel.org/r/20240130100247.81460-1-chentao@kylinos.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	da08d2edb0	io_uring: re-arrange struct io_ring_ctx to reduce padding Nothing major here, just moving a few things around to reduce the padding. This reduces the size on a non-debug kernel from 1536 to 1472 bytes, saving a full cacheline. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	af5d68f889	io_uring/sqpoll: manage task_work privately Decouple from task_work running, and cap the number of entries we process at the time. If we exceed that number, push remaining entries to a retry list that we'll process first next time. We cap the number of entries to process at 8, which is fairly random. We just want to get enough per-ctx batching here, while not processing endlessly. Since we manually run PF_IO_WORKER related task_work anyway as the task never exits to userspace, with this we no longer need to add an actual task_work item to the per-process list. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	2708af1adc	io_uring: pass in counter to handle_tw_list() rather than return it No functional changes in this patch, just in preparation for returning something other than count from this helper. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	42c0905f0c	io_uring: cleanup handle_tw_list() calling convention Now that we don't loop around task_work anymore, there's no point in maintaining the ring and locked state outside of handle_tw_list(). Get rid of passing in those pointers (and pointers to pointers) and just do the management internally in handle_tw_list(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	3cdc4be114	io_uring/poll: improve readability of poll reference decrementing This overly long line is hard to read. Break it up by AND'ing the ref mask first, then perform the atomic_sub_return() with the value itself. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	9fe3eaea4a	io_uring: remove unconditional looping in local task_work handling If we have a ton of notifications coming in, we can be looping in here for a long time. This can be problematic for various reasons, mostly because we can starve userspace. If the application is waiting on N events, then only re-run if we need more events. Fixes: `c0e0d6ba25` ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	670d9d3df8	io_uring: remove next io_kiocb fetch in task_work running We just reversed the task_work list and that will have touched requests as well, just get rid of this optimization as it should not make a difference anymore. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	170539bdf1	io_uring: handle traditional task_work in FIFO order For local task_work, which is used if IORING_SETUP_DEFER_TASKRUN is set, we reverse the order of the lockless list before processing the work. This is done to process items in the order in which they were queued, as the llist always adds to the head. Do the same for traditional task_work, so we have the same behavior for both types. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	4c98b89175	io_uring: remove 'loops' argument from trace_io_uring_task_work_run() We no longer loop in task_work handling, hence delete the argument from the tracepoint as it's always 1 and hence not very informative. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	592b480543	io_uring: remove looping around handling traditional task_work A previous commit added looping around handling traditional task_work as an optimization, and while that may seem like a good idea, it's also possible to run into application starvation doing so. If the task_work generation is bursty, we can get very deep task_work queues, and we can end up looping in here for a very long time. One immediately observable problem with that is handling network traffic using provided buffers, where flooding incoming traffic and looping task_work handling will very quickly lead to buffer starvation as we keep running task_work rather than returning to the application so it can handle the associated CQEs and also provide buffers back. Fixes: `3a0c037b0e` ("io_uring: batch task_work") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	8435c6f380	io_uring/kbuf: cleanup passing back cflags We have various functions calculating the CQE cflags we need to pass back, but it's all the same everywhere. Make a number of the putting functions void, and just have the two main helps for this, io_put_kbuf() and io_put_kbuf_comp() calculate the actual mask and pass it back. While at it, cleanup how we put REQ_F_BUFFER_RING buffers. Before this change, we would call into __io_put_kbuf() only to go right back in to the header defined functions. As clearing this type of buffer is just re-assigning the buf_index and incrementing the head, this is very wasteful. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	949249e25f	io_uring/rw: remove dead file == NULL check Any read/write opcode has needs_file == true, which means that we would've failed the request long before reaching the issue stage if we didn't successfully assign a file. This check has been dead forever, and is really a leftover from generic code. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	4caa74fdce	io_uring: cleanup io_req_complete_post() Move the ctx declaration and assignment up to be generally available in the function, as we use req->ctx at the top anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	bfe30bfde2	io_uring: mark the need to lock/unlock the ring as unlikely Any of the fast paths will already have this locked, this helper only exists to deal with io-wq invoking request issue where we do not have the ctx->uring_lock held already. This means that any common or fast path will already have this locked, mark it as such. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	95041b93e9	io_uring: add io_file_can_poll() helper This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	521223d7c2	io_uring/cancel: don't default to setting req->work.cancel_seq Just leave it unset by default, avoiding dipping into the last cacheline (which is otherwise untouched) for the fast path of using poll to drive networked traffic. Add a flag that tells us if the sequence is valid or not, and then we can defer actually assigning the flag and sequence until someone runs cancelations. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:06 -07:00
Jens Axboe	4bcb982cce	io_uring: expand main struct io_kiocb flags to 64-bits We're out of space here, and none of the flags are easily reclaimable. Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. Add a specific bitwise type for the request flags, io_request_flags_t. This will help catch violations of casting this value to a smaller type on 32-bit archs, like unsigned int. This creates a hole in the io_kiocb, so move nr_tw up and rsrc_node down to retain needing only cacheline 0 and 1 for non-polled opcodes. No functional changes intended in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-08 13:27:03 -07:00
Alexander Mikhalitsyn	5492a490e6	io_uring: use file_mnt_idmap helper Let's use file_mnt_idmap() as we do that across the tree. No functional impact. Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Link: https://lore.kernel.org/r/20240129180024.219766-2-aleksandr.mikhalitsyn@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-06 19:55:14 -07:00
Linus Torvalds	54be6c6c5a	Linux 6.8-rc3	2024-02-04 12:20:36 +00:00
Linus Torvalds	3f24fcdacd	Miscellaneous bug fixes and cleanups in ext4's multi-block allocator and extent handling code. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmW/G4YACgkQ8vlZVpUN gaPTpwf/c/Fk27GV8ge9PQtR6gmir/lyw2qkvK3Z+12aEsblZRmyvElyZWjAuNQG bciQyltabIPOA4XxfsZOrdgYI42n0rTTFG7bmI0lr+BJM/HRw0tGGMN91FZla0FP EXv/AiHKCqlT5OFZbD+8n1TzfdOgWotjug1VLteXve3YKjkDgt5IQm/0Gx9hKBld IR8SrQlD/rYe+VPvaHz5G4u09Ne5pUE5fDj3xE23wxfU5cloVzuVRCSOGWUCTnCW T0v6sHeKrmiLC8tIOZkBjer4nXC0MOu0p5+geAjwOArc9VJ1Lh2eAkH+GgDOVprx ahdl2qmbIbacBYECIeQ/+1i78+O1yw== =CmYr -----END PGP SIGNATURE----- Merge tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator and extent handling code" * tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type ext4: make ext4_map_blocks() distinguish delalloc only extent ext4: add a hole extent entry in cache after punch ext4: correct the hole length returned by ext4_map_blocks() ext4: convert to exclusive lock while inserting delalloc extents ext4: refactor ext4_da_map_blocks() ext4: remove 'needed' in trace_ext4_discard_preallocations ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations ext4: remove unused return value of ext4_mb_release_group_pa ext4: remove unused return value of ext4_mb_release_inode_pa ext4: remove unused return value of ext4_mb_release ext4: remove unused ext4_allocation_context::ac_groups_considered ext4: remove unneeded return value of ext4_mb_release_context ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*() ext4: remove unused return value of __mb_check_buddy ext4: mark the group block bitmap as corrupted before reporting an error ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() ...	2024-02-04 07:33:01 +00:00
Linus Torvalds	9e28c7a23b	five smb3 client fixes, mostly multichannel related -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmW+oQoACgkQiiy9cAdy T1E6GwwAk9PQo1N8h6AI0PWCPoiHps8YzpISTgBfchDs+sIvo1sZEVMzTmSlWm4F cON4nb3QBkD4aqWbCdb1tjby2VAb0aHuRbCQPAczW4+R7He8ELlGYow7IPBWmyI8 CTQRirYrJuIePh1aT2UG4mykNyQzp5i5ycsdcWFiIGliXrTp0De0rvE/60KGLuJ/ lnDZp7+FHytIfpwVzl4bGax4odQh14whxdQme4C9a8kVfYPQGKFyADbuxy0KmWmu 8kkEcGpY/bPdqLE1tGzNoVci9cEdYK6yUzkc8dj2ddsZ7YDHPz0NtsdWlC517qt+ ekDADHRQZiKwyJzMGHibK8E4WoP0+JjB4j6OTc369ICzgjuIutWCCIexbS0IV/po IuyRQTvLSTfYpE8eoQavAKaty6sDnJcP7FepPtH2k5yGr4+ILZV4ozoH+hqzn2/K sf/q9ZfkyfaEi2JkMD9TTv7k2EWLMntpxklbpg59VKHY5MGvlQqoRl9b+jzfgKEZ xS/myr3e =q5wf -----END PGP SIGNATURE----- Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: "Five smb3 client fixes, mostly multichannel related: - four multichannel fixes including fix for channel allocation when multiple inactive channels, fix for unneeded race in channel deallocation, correct redundant channel scaling, and redundant multichannel disabling scenarios - add warning if max compound requests reached" * tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: increase number of PDUs allowed in a compound request cifs: failure to add channel on iface should bump up weight cifs: do not search for channel if server is terminating cifs: avoid redundant calls to disable multichannel cifs: make sure that channel scaling is done only once	2024-02-04 07:26:19 +00:00
Linus Torvalds	fc86e5c990	Bug fixes for 6.8-rc3: * Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format attribute fork. * Remove conditional compilation of realtime geometry validator functions to prevent confusing error messages from being printed on the console during the mount operation. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZbo7FAAKCRAH7y4RirJu 9Cn+APsFEbHA8YQpCSxGDM+Xelez9X7wroi6QkyOxRP6Lqq6ogD6A96uuV86TxkQ Hkse9IAKkFoLmyzohT9u7Bv46M/X4w8= =Ez8Z -----END PGP SIGNATURE----- Merge tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Chandan Babu: - Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format attribute fork - Remove conditional compilation of realtime geometry validator functions to prevent confusing error messages from being printed on the console during the mount operation * tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove conditional building of rt geometry validator functions xfs: reset XFS_ATTR_INCOMPLETE filter on node removal	2024-02-04 07:22:51 +00:00
Linus Torvalds	3a0e922079	Char/Misc driver fixes for 6.8-rc3 Here are three tiny driver fixes for 6.8-rc3. They include: - Android binder long-term bug with epoll finally being fixed - fastrpc driver shutdown bugfix - open-dice lockdep fix All of these have been in linux-next this week with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZb6yeA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yktLwCgwz0RakMt9dqjbip/1NJAXaRlOAgAoLWeyBLt qBVv8Y50No3dxCuKbsJZ =fqm2 -----END PGP SIGNATURE----- Merge tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are three tiny driver fixes for 6.8-rc3. They include: - Android binder long-term bug with epoll finally being fixed - fastrpc driver shutdown bugfix - open-dice lockdep fix All of these have been in linux-next this week with no reported issues" * tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: binder: signal epoll threads of self-work misc: open-dice: Fix spurious lockdep warning misc: fastrpc: Mark all sessions as invalid in cb_remove	2024-02-04 07:01:39 +00:00

1 2 3 4 5 ...

1249647 commits