linux-stable/io_uring
Lin Ma b01f4ae68d io_uring/poll: fix poll_refs race with cancelation
[ Upstream commit 12ad3d2d6c ]

There is an interesting race condition of poll_refs which could result
in a NULL pointer dereference. The crash trace is like:

KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline]
RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190
Code: ...
RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000
RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781
R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60
R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000
FS:  00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299
 handle_tw_list io_uring/io_uring.c:1037 [inline]
 tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090
 task_work_run+0x13a/0x1b0 kernel/task_work.c:177
 get_signal+0x2402/0x25a0 kernel/signal.c:2635
 arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869
 exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
 exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause for this is a tiny overlooking in
io_poll_check_events() when cocurrently run with poll cancel routine
io_poll_cancel_req().

The interleaving to trigger use-after-free:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
  v = atomic_read(...)                     |
  // poll_refs not change                  |
                                           |
  if (v & IO_POLL_CANCEL_FLAG)             |
   return -ECANCELED;                      |
  // io_poll_check_events return           |
  // will go into                          |
  // io_req_complete_failed() free req     |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

And the interleaving to trigger the kernel WARNING:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
  v = atomic_read(...)                     |
  // v = IO_POLL_CANCEL_FLAG               |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
                                           |
  WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))   |
  // v & IO_POLL_REF_MASK = 0 WARN         |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

By looking up the source code and communicating with Pavel, the
implementation of this atomic poll refs should continue the loop of
io_poll_check_events() just to avoid somewhere else to grab the
ownership. Therefore, this patch simply adds another AND operation to
make sure the loop will stop if it finds the poll_refs is exactly equal
to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and
will finally make its way to io_req_complete_failed(), the req will
be reclaimed as expected.

Fixes: aa43477b04 ("io_uring: poll rework")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: tweak description and code style]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02 17:43:09 +01:00
..
Makefile
advise.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
advise.h
alloc_cache.h
cancel.c io_uring: fix off-by-one in sync cancelation file check 2022-08-23 07:26:08 -06:00
cancel.h
epoll.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
epoll.h
fdinfo.c io_uring: fix fdinfo sqe offsets calculation 2022-10-21 12:39:29 +02:00
fdinfo.h
filetable.c io_uring/filetable: fix file reference underflow 2022-12-02 17:43:09 +01:00
filetable.h
fs.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
fs.h
io-wq.c io-wq: Fix memory leak in worker creation 2022-10-26 12:22:57 +02:00
io-wq.h
io_uring.c io_uring: fix multishot accept request leaks 2022-11-26 09:27:49 +01:00
io_uring.h io_uring: fix multishot accept request leaks 2022-11-26 09:27:49 +01:00
kbuf.c io_uring: check for rollover of buffer ID when providing buffers 2022-11-16 10:04:09 +01:00
kbuf.h io_uring/kbuf: fix not advancing READV kbuf ring 2022-09-07 10:36:10 -06:00
msg_ring.c io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd() 2022-10-29 10:08:35 +02:00
msg_ring.h
net.c io_uring: fix multishot recv request leaks 2022-11-26 09:27:49 +01:00
net.h io_uring/net: rename io_sendzc() 2022-10-21 12:39:27 +02:00
nop.c
nop.h
notif.c io_uring/notif: Remove the unused function io_notif_complete() 2022-09-05 11:42:39 -06:00
notif.h io_uring/net: simplify zerocopy send user API 2022-09-01 09:13:33 -06:00
opdef.c io_uring/net: rename io_sendzc() 2022-10-21 12:39:27 +02:00
opdef.h io_uring: add custom opcode hooks on fail 2022-10-21 12:37:32 +02:00
openclose.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
openclose.h
poll.c io_uring/poll: fix poll_refs race with cancelation 2022-12-02 17:43:09 +01:00
poll.h
refs.h
rsrc.c io_uring/af_unix: defer registered files gc to io_uring release 2022-10-21 12:37:33 +02:00
rsrc.h Revert "io_uring: rename IORING_OP_FILES_UPDATE" 2022-09-01 09:13:33 -06:00
rw.c io_uring/rw: remove leftover debug statement 2022-10-29 10:08:33 +02:00
rw.h io_uring/rw: don't lose partial IO result on fail 2022-10-21 12:37:32 +02:00
slist.h
splice.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
splice.h
sqpoll.c
sqpoll.h
statx.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
statx.h
sync.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
sync.h
tctx.c
tctx.h
timeout.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
timeout.h
uring_cmd.c lsm/stable-6.0 PR 20220829 2022-08-31 09:23:16 -07:00
uring_cmd.h
xattr.c io_uring: make io_kiocb_to_cmd() typesafe 2022-08-12 17:01:00 -06:00
xattr.h