Commit graph

69657 commits

Author SHA1 Message Date
Vivek Goyal
3466958beb fuse: invalidate attrs when page writeback completes
In fuse when a direct/write-through write happens we invalidate attrs
because that might have updated mtime/ctime on server and cached
mtime/ctime will be stale.

What about page writeback path.  Looks like we don't invalidate attrs
there.  To be consistent, invalidate attrs in writeback path as well.  Only
exception is when writeback_cache is enabled.  In that case we strust local
mtime/ctime and there is no need to invalidate attrs.

Recently users started experiencing failure of xfstests generic/080,
geneirc/215 and generic/614 on virtiofs.  This happened only newer "stat"
utility and not older one.  This patch fixes the issue.

So what's the root cause of the issue.  Here is detailed explanation.

generic/080 test does mmap write to a file, closes the file and then checks
if mtime has been updated or not.  When file is closed, it leads to
flushing of dirty pages (and that should update mtime/ctime on server).
But we did not explicitly invalidate attrs after writeback finished.  Still
generic/080 passed so far and reason being that we invalidated atime in
fuse_readpages_end().  This is called in fuse_readahead() path and always
seems to trigger before mmaped write.

So after mmaped write when lstat() is called, it sees that atleast one of
the fields being asked for is invalid (atime) and that results in
generating GETATTR to server and mtime/ctime also get updated and test
passes.

But newer /usr/bin/stat seems to have moved to using statx() syscall now
(instead of using lstat()).  And statx() allows it to query only ctime or
mtime (and not rest of the basic stat fields).  That means when querying
for mtime, fuse_update_get_attr() sees that mtime is not invalid (only
atime is invalid).  So it does not generate a new GETATTR and fill stat
with cached mtime/ctime.  And that means updated mtime is not seen by
xfstest and tests start failing.

Invalidating attrs after writeback completion should solve this problem in
a generic manner.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Vivek Goyal
550a7d3bc0 fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID
When posix access ACL is set, it can have an effect on file mode and it can
also need to clear SGID if.

- None of caller's group/supplementary groups match file owner group.
AND
- Caller is not priviliged (No CAP_FSETID).

As of now fuser server is responsible for changing the file mode as
well. But it does not know whether to clear SGID or not.

So add a flag FUSE_SETXATTR_ACL_KILL_SGID and send this info with SETXATTR
to let file server know that sgid needs to be cleared as well.

Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Vivek Goyal
52a4c95f4d fuse: extend FUSE_SETXATTR request
Fuse client needs to send additional information to file server when it
calls SETXATTR(system.posix_acl_access), so add extra flags field to the
structure.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Alessio Balsini
6076f5f341 fuse: fix matching of FUSE_DEV_IOC_CLONE command
With commit f8425c9396 ("fuse: 32-bit user space ioctl compat for fuse
device") the matching constraints for the FUSE_DEV_IOC_CLONE ioctl command
are relaxed, limited to the testing of command type and number.  As Arnd
noticed, this is wrong as it wouldn't ensure the correctness of the data
size or direction for the received FUSE device ioctl.

Fix by bringing back the comparison of the ioctl received by the FUSE
device to the originally generated FUSE_DEV_IOC_CLONE.

Fixes: f8425c9396 ("fuse: 32-bit user space ioctl compat for fuse device")
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Bhaskar Chowdhury
aa6ff555f0 fuse: fix a typo
s/reponsible/responsible/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Miklos Szeredi
a73d47f577 fuse: don't zero pages twice
All callers of fuse_short_read already set the .page_zeroing flag, so no
need to do the tail zeroing again.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:56 +02:00
Connor Kuehl
4b91459ad2 fuse: fix typo for fuse_conn.max_pages comment
'Maxmum' -> 'Maximum'

Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:56 +02:00
Vivek Goyal
4f06dd92b5 fuse: fix write deadlock
There are two modes for write(2) and friends in fuse:

a) write through (update page cache, send sync WRITE request to userspace)

b) buffered write (update page cache, async writeout later)

The write through method kept all the page cache pages locked that were
used for the request.  Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.

The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.

For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.

Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.

If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data.  After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data.  While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.

In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.

Partial write of a not uptodate page still needs to be handled.  One way
would be to read the complete page before doing the write.  This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.

The other solution is to serialize the synchronous write with reads from
the partial pages.  The easiest way to do this is to keep the partial pages
locked.  The problem is that a write() may involve two such pages (one head
and one tail).  This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.

Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:56 +02:00
Linus Torvalds
d83e98f9d8 io_uring-5.12-2021-04-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBoyXQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiOXD/4wnXFHOmt5HMHmEXU4bB5b46snh3iCU4QM
 w7pqgPMgS8uRZZ16FWpzDwccC2NZDoFRHhFxDrOqdKMO1DQlzk1yfSeKjNs9h2Oe
 mJ6JuuaX6bEk2RRmjqqM5aMCazE2+tWtDydveL9OrJO6uiPcZYFPiRDwafDzQo4n
 YPhUhPm+dqn/4Li6Fap2ieCeLqXNEUKpA0/Bd9QQV0Q/oZq5oLdJefIk8EMXH/tf
 eKH2muZDjOV0FYdG8lPsNAF0c5qJ/aID+jlhyUz8Bkn31lOS96d5rzXoFq+AonsC
 gVwLbaMcAibHrDjNxIQGcEU0VSjvvfy9GAfjJ3uSuyjpE4dNMe2fuU/B3rF6xb6E
 upEfAik+frvzfFuZx11SZh/JwNBatJh35DVZZczI48YKk+s2MI0q9+lNINLtL6bD
 3J287jnZbETU54WCMruiRwjQ3J1YWOj2pfxrPT9J0NZdQ0r7MXsDJecGNu+nL0X3
 Ry7IpXUqabCf+4+XrGZ2NG6/kd5D/smatc5FqTkyeih3mw94iprNuWanzBlUZYQV
 8ybrVtW37caAd1vyx0/p2I1LV5Y0eNGpSWz/WTDj0FINPCPjLq/VmPsIYgihmvz4
 joWEzGnBKqds/CAvSneyz2d9MVlQ1083Az/Vi9g4w0IHG/p3ekQGBPDFBvQyszq9
 pNkMpQ9Wxw==
 =edGl
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block

POull io_uring fix from Jens Axboe:
 "Just fixing a silly braino in a previous patch, where we'd end up
  failing to compile if CONFIG_BLOCK isn't enabled.

  Not that a lot of people do that, but kernel bot spotted it and it's
  probably prudent to just flush this out now before -rc6.

  Sorry about that, none of my test compile configs have !CONFIG_BLOCK"

* tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block:
  io_uring: fix !CONFIG_BLOCK compilation failure
2021-04-03 14:26:47 -07:00
Linus Torvalds
8e29be3468 Two more gfs2 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmBomgAUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqWkA//bGV+XgrDas0mBjAQiGBcqC9M1Ts1
 cffrJ2E9E5J/Cnn+/wAmkGYZU3oHpdmNI4DytCneHhfY9YamSlg6B9MDxebF3Iuu
 eXiWB4rXEf7l5AG5ywUloXl171TTxOQDo5bbOlvYeDWPZ/WYEBsdZ/rxF9yIent/
 RS5HMt9I9x6c2WQGTlBvo8D618WyfxzkX5AjvhojITgX5UItZg6bOKkcuqcQ9PNG
 5MGnHCCJU2Zh3Y6gGZjp3rQnxRAzpBhFdVrXaZgTAtKLycsGHMcjqfmST6N2si8l
 cCDgujRhRfIOpQZs0vOdcJVtpNBfDxgOO5JlTdkY6Grh/STYoN9dFo7V0SnYhMJE
 FdBgfHwNyyBEo/QV3gDXrvITxw+xypACjS/znffArxFySNzSfv2oWPCxOJvbel0L
 H4y+gbJ4R+QUzkUuvnXjxjl7c70jK+flLbzUxXxeSQBmOHtCiHZJK43UnCq+fpmZ
 hOrUaYHvCV1iC/9OeAy1N8MlXicUHnpmu/7q7GEGaRTV4zN85MjddOiRKAz6RiAl
 2nB64GbLDFjY3HF8+/giEwAWikcYPk2W8S0WmX9Bjn+UdMYuH1cWF/TLwWrSmQJX
 MQmYlwLOj+UKg8Ku0+Klh2c8oMxo7C8o9t9BjBJio5od0U/sen9v/08DUobk4jwN
 hhjW+as1ZV/+Tpc=
 =TP8w
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:
 "Two more gfs2 fixes"

* tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: report "already frozen/thawed" errors
  gfs2: Flag a withdraw if init_threads() fails
2021-04-03 12:15:01 -07:00
Jens Axboe
e82ad48539 io_uring: fix !CONFIG_BLOCK compilation failure
kernel test robot correctly pinpoints a compilation failure if
CONFIG_BLOCK isn't set:

fs/io_uring.c: In function '__io_complete_rw':
>> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
    2509 |  if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
         |                                                ^~~~~~~~~~~~~~~~~~~~
         |                                                io_rw_reissue
    cc1: some warnings being treated as errors

Ensure that we have a stub declaration of io_rw_should_reissue() for
!CONFIG_BLOCK.

Fixes: 230d50d448 ("io_uring: move reissue into regular IO path")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02 19:45:34 -06:00
Linus Torvalds
d93a0d43e3 block-5.12-2021-04-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBnh84QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpubkD/0Y+l3cecjzds3RnRXEXYsRFBKGfK6c7Uuu
 QVCrlRp6tKPmBoDLQyl95Mg0e44pR4s3Bw5W4j9GmJtVyNVzC2x3dqXn3uXSFca/
 KU+4GIzl2VIXS5Pn90GLE6/xw3FtVy8w2c6V3g4jkLR29bexPdO4s57cohxKR9kL
 ZU+icCag9RlNIYkuB79Wy6Y3/m41L5WRkMGiMb0sJS9Q+k+zetZNIeNIxWn4E1zF
 qWymdyBFx31qL8/2ZmRwb8XzF5qE2XimXz1a7ZX754zyR/Ry5rGc0h+JjqgUhSV9
 wM2gLlMNEP+k+8DOU9ACYdff18P6b+RZ8mJnGZjZseAut1qJXonVtgDoWX7mEs9+
 8Gl+n18TYpKEfzLiOOOtu/xeZYMjp0MUjO6iHTpzRfqBjKNoZGTuz0wGC5nX/ZYI
 y5QWifI0NmMmTPDJpH6nVYzqDLbEZzcMz6WeOfhKQ/yv7gOxj+BFGJ3olJ+DAx8c
 e6HDPa/WkC0iqie5cpzYjmve0HrKJADMMrRRWGRkgmOZ8uAaSS17rZExg1CICr8I
 bOVYsrPsg8ErKVvzlx/DK6EfhNrw0+Db7paYccl2a3pXx/T8iHmW3RSqn7jMrhA1
 7QPOCUMKuWuaOupWJWw25gxNS3viJa57/hxMG1nvAgpJx6QvBNaLrwcIWXO1cfrp
 boe/UFnftg==
 =8odY
 -----END PGP SIGNATURE-----

Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Remove comment that never came to fruition in 22 years of development
   (Christoph)

 - Remove unused request flag (Christoph)

 - Fix for null_blk fake timeout handling (Damien)

 - Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel)

 - Error propagation fix for multiple split bios (Yufen)

* tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
  block: remove the unused RQF_ALLOCED flag
  block: update a few comments in uapi/linux/blkpg.h
  block: don't ignore REQ_NOWAIT for direct IO
  null_blk: fix command timeout completion handling
  block: only update parent bi_status when bio fail
2021-04-02 16:13:13 -07:00
Linus Torvalds
1faccb6394 io_uring-5.12-2021-04-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBnh+kQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpo3AEACSddwiafCkKLQyl5oaIdrzP1ANvH3vWOyD
 MCbcf0NR5W1dcYS4JSA3fmrXpBVYL5tPdAxYcbachBhK2zYJaWuZtgQlB3ofYiNo
 x1nRFsJXcY/vNBCrZo5xJTgRHyvsNrviZFgb2OOy9Cv2IDn0riJSciPr+A1cIE6J
 Tn1lhGaWHDcboWl2oYUAGUWimkmTuuCcwpP6KCuBVRkTc+C1v4sRy2EO/84AQUBc
 XQWov8IUCDISlZmiukktr4a1+9vL4PbsLDRw2Zc8ZH6oTuNIju8sQgxyzm/EN4Uz
 D3oJ/YEHNUfW+divI3djqwNBiskcl9SUcpgzPwkWOJf+YcUE6iGNJPwJ9B+1NiH9
 WKmgjulRrDMTO9/flK8+GpAegDjaPUXcM4nd1ItQGHX6GHxCIWYaNHsngWgWebSy
 +wjOlwRxCdgRRhwAWQwu8k5O85UjCLO8uq4mK0TA2GTz5QzGVa9dQaqovMpsHAOb
 8TtxWdRFePZIl3CXB3r6nSFQv3S9d70Dq5+Mgq7pz9+n0vGfV6cTbWPIbne2V7g+
 +IaZlVLQXu8WRTf/sTq91LWyaJrJiMEsY7dts+8K9lGsdFT0PJIxf6VeuZpBYCBg
 B+JBHpdlMBZhTjltEzEubBUQZog+cQkway90Q7MtL4Ue+qwV4WbgLziHTyzL3GmI
 cQiujMlcRg==
 =pxfZ
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Nothing really major in here, and finally nothing really related to
  signals. A few minor fixups related to the threading changes, and some
  general fixes, that's it.

  There's the pending gdb-get-confused-about-arch, but that's more of a
  cosmetic issue, nothing that hinder use of it. And given that other
  archs will likely be affected by that oddity too, better to postpone
  any changes there until 5.13 imho"

* tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
  io_uring: move reissue into regular IO path
  io_uring: fix EIOCBQUEUED iter revert
  io_uring/io-wq: protect against sprintf overflow
  io_uring: don't mark S_ISBLK async work as unbounded
  io_uring: drop sqd lock before handling signals for SQPOLL
  io_uring: handle setup-failed ctx in kill_timeouts
  io_uring: always go for cancellation spin on exec
2021-04-02 16:08:19 -07:00
Jens Axboe
230d50d448 io_uring: move reissue into regular IO path
It's non-obvious how retry is done for block backed files, when it happens
off the kiocb done path. It also makes it tricky to deal with the iov_iter
handling.

Just mark the req as needing a reissue, and handling it from the
submission path instead. This makes it directly obvious that we're not
re-importing the iovec from userspace past the submit point, and it means
that we can just reuse our usual -EAGAIN retry path from the read/write
handling.

At some point in the future, we'll gain the ability to always reliably
return -EAGAIN through the stack. A previous attempt on the block side
didn't pan out and got reverted, hence the need to check for this
information out-of-band right now.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02 09:24:20 -06:00
Pavel Begunkov
f8b78caf21 block: don't ignore REQ_NOWAIT for direct IO
If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02 08:34:30 -06:00
Pavel Begunkov
07204f2157 io_uring: fix EIOCBQUEUED iter revert
iov_iter_revert() is done in completion handlers that happensf before
read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards.
Moreover, even though it may appear being just a no-op, it's actually
races with 1) user forging a new iovec of a different size 2) reissue,
that is done via io-wq continues completely asynchronously.

Fixes: 3e6a0d3c75 ("io_uring: fix -EAGAIN retry with IOPOLL")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01 09:31:21 -06:00
Pavel Begunkov
696ee88a7c io_uring/io-wq: protect against sprintf overflow
task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01 09:21:18 -06:00
Jens Axboe
4b982bd0f3 io_uring: don't mark S_ISBLK async work as unbounded
S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01 08:56:28 -06:00
Tetsuo Handa
5e46d1b78a reiserfs: update reiserfs_xattrs_initialized() condition
syzbot is reporting NULL pointer dereference at reiserfs_security_init()
[1], for commit ab17c4f021 ("reiserfs: fixup xattr_root caching")
is assuming that REISERFS_SB(s)->xattr_root != NULL in
reiserfs_xattr_jcreate_nblocks() despite that commit made
REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL
case possible.

I guess that commit 6cb4aff0a7 ("reiserfs: fix oops while creating
privroot with selinux enabled") wanted to check xattr_root != NULL
before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking
about the xattr root.

  The issue is that while creating the privroot during mount
  reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
  dereferences the xattr root. The xattr root doesn't exist, so we get
  an oops.

Therefore, update reiserfs_xattrs_initialized() to check both the
privroot and the xattr root.

Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1]
Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cb4aff0a7 ("reiserfs: fix oops while creating privroot with selinux enabled")
Acked-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-30 14:27:32 -07:00
Jens Axboe
82734c5b1b io_uring: drop sqd lock before handling signals for SQPOLL
Don't call into get_signal() with the sqd mutex held, it'll fail if we're
freezing the task and we'll get complaints on locks still being held:

====================================
WARNING: iou-sqp-8386/8387 still has locks held!
5.12.0-rc4-syzkaller #0 Not tainted
------------------------------------
1 lock held by iou-sqp-8386/8387:
 #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731

 stack backtrace:
 CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
  try_to_freeze include/linux/freezer.h:66 [inline]
  get_signal+0x171a/0x2150 kernel/signal.c:2576
  io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748

Fold the get_signal() case in with the parking checks, as we need to drop
the lock in both cases, and since we need to be checking for parking when
juggling the lock anyway.

Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
Fixes: dbe1bdbb39 ("io_uring: handle signals for IO threads like a normal thread")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-30 14:36:46 -06:00
Pavel Begunkov
51520426f4 io_uring: handle setup-failed ctx in kill_timeouts
general protection fault, probably for non-canonical address
	0xdffffc0000000018: 0000 [#1] KASAN: null-ptr-deref
	in range [0x00000000000000c0-0x00000000000000c7]
RIP: 0010:io_commit_cqring+0x37f/0xc10 fs/io_uring.c:1318
Call Trace:
 io_kill_timeouts+0x2b5/0x320 fs/io_uring.c:8606
 io_ring_ctx_wait_and_kill+0x1da/0x400 fs/io_uring.c:8629
 io_uring_create fs/io_uring.c:9572 [inline]
 io_uring_setup+0x10da/0x2ae0 fs/io_uring.c:9599
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

It can get into wait_and_kill() before setting up ctx->rings, and hence
io_commit_cqring() fails. Mimic poll cancel and do it only when we
completed events, there can't be any requests if it failed before
initialising rings.

Fixes: 80c4cbdb5e ("io_uring: do post-completion chore on t-out cancel")
Reported-by: syzbot+0e905eb8228070c457a0@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/660261a48f0e7abf260c8e43c87edab3c16736fa.1617014345.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-29 06:48:26 -06:00
Pavel Begunkov
5a978dcfc0 io_uring: always go for cancellation spin on exec
Always try to do cancellation in __io_uring_task_cancel() at least once,
so it actually goes and cleans its sqpoll tasks (i.e. via
io_sqpoll_cancel_sync()), otherwise sqpoll task may submit new requests
after cancellation and it's racy for many reasons.

Fixes: 521d6a737a ("io_uring: cancel sqpoll via task_work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a21bd6d794bb1629bc906dd57a57b2c2985a8ac.1616839147.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-28 18:11:53 -06:00
Linus Torvalds
81b1d39fd3 5 cifs/smb3 fixes, 2 for stable, includes an important fix for encryption and an ACL fix, as well as a fix for possible reflink data corruption
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmBfx9sACgkQiiy9cAdy
 T1F/igv8DsHxOJLw9kc5pBrbmUEgsUpQbdRMESqEROyqKte80jga2P3wsvJQYqQY
 JwHPxh477eiRSpSkEWSFDMmELsVtoQIYv3aqgPe79668eCd97mHRM2ItSV++5x9M
 iJ0N8GuiVARSyKmndrZ9gvbPoJb4TKkPX6X44pDSgAkgskvTTFKTywZaY5IEiqKe
 9zBWghZbNnWhtYG+2On2M2tzy8/Fo8aveLxhFhJstZ0IP6Px+Rg9GMdzRAfDJqK+
 QQMwcqmRKjVo4/Z6yji/s9OI1+eQyIAKLa6cyB0Yd+AqnvDYv1dagkRAjRCHl/Ri
 28loxGatXeXjXJGYU58EjNkKdoBUh09idJJolcMGwPSteL2j1DQDV9utbZLhAWPq
 yNugiIkzbQj3Z55UQ3n3u79pztK31GZ2TOcwJbIqQs3tctJ5aqUIWjQibLVpaNBR
 7C5Yug9aC5gpr3LPIUD3AGZIUAenCzsVN5Y9br4SPx0/zHmbynyyF27w14shX8O/
 3uQr6xhl
 =NxLV
 -----END PGP SIGNATURE-----

Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Five cifs/smb3 fixes, two for stable.

  Includes an important fix for encryption and an ACL fix, as well as a
  fix for possible reflink data corruption"

* tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix cached file size problems in duplicate extents (reflink)
  cifs: Silently ignore unknown oplock break handle
  cifs: revalidate mapping when we open files for SMB1 POSIX
  cifs: Fix chmod with modefromsid when an older ACE already exists.
  cifs: Adjust key sizes and key generation routines for AES256 encryption
2021-03-28 12:06:21 -07:00
Linus Torvalds
b44d1ddcf8 io_uring-5.12-2021-03-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBf1KAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjVSD/0f1HdekXnIE6aSRQ7YEV8ux2t5wUeDyP8U
 cdcZ8fBW9PvKZLdODSI4sw8UYV5OYEBcfImFe3nRVHR+RIVQo72UTYvuHqeUYNct
 w3drgF2GEMIxJFZR6zf9LDrQVduPqXvbEJLui6TN+eX/5E99ZlUWMLwkX1k+vDju
 QfaGZjz2736GTn1MPc7jdyZKoK7eCi5xtNFPash5wGck7aYl5TGXnG/8bRYsv2Tw
 eCYKbvv4x0s8OFcYVQMooDfbIMCyyfTwt6YatFHQEtM/RM+M66gndvv3jfkeJQju
 hz0I8qOJ8X5lf0VucncWs5J8b9Whr5YZV+k9461xalBbV9ed2vzIIikP8DpCxtYz
 yKbsdDm0+3hwfuZOz+d7ooEXKsphJ1PnSsEeuNZXtKDXVtphksUbbq4H2NLINcsQ
 m6dwaRPSEA0EymngGY2e+8+CU0euiE4mqoMpw4D9m9Irs+BAaWYGk9xCWr0BGem0
 auZOMqvV2xktdBlGx1BJCLts1sHHxy8IM3u0852R/1AfcKOkXwNVPt62I8e9ceIA
 wc731aWHwJfS25m430xFDPJKJpUZoZgste4qwVym70CmRziuamgYyIfrfRg1ZjsD
 ZBa9Z4hPiT4e0eDqlYjcMpl9FORgYQXVXy5ofd/eZg5xkU8X+i6TVZkaQNkZyqV/
 4ogBZYUolg==
 =mwLC
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Use thread info versions of flag testing, as discussed last week.

 - The series enabling PF_IO_WORKER to just take signals, instead of
   needing to special case that they do not in a bunch of places. Ends
   up being pretty trivial to do, and then we can revert all the special
   casing we're currently doing.

 - Kill dead pointer assignment

 - Fix hashed part of async work queue trace

 - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS

 - Fix a link completion ordering regression in this merge window

 - Cancellation fixes

* tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
  io_uring: remove unsued assignment to pointer io
  io_uring: don't cancel extra on files match
  io_uring: don't cancel-track common timeouts
  io_uring: do post-completion chore on t-out cancel
  io_uring: fix timeout cancel return code
  Revert "signal: don't allow STOP on PF_IO_WORKER threads"
  Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
  Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
  Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
  kernel: stop masking signals in create_io_thread()
  io_uring: handle signals for IO threads like a normal thread
  kernel: don't call do_exit() for PF_IO_WORKER threads
  io_uring: maintain CQE order of a failed link
  io-wq: fix race around pending work on teardown
  io_uring: do ctx sqd ejection in a clear context
  io_uring: fix provide_buffers sign extension
  io_uring: don't skip file_end_write() on reissue
  io_uring: correct io_queue_async_work() traces
  io_uring: don't use {test,clear}_tsk_thread_flag() for current
2021-03-28 11:42:05 -07:00
Linus Torvalds
abed516ecd block-5.12-2021-03-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBf1YoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprivEADZx//LFwziicjD3Nd5XcLfMeE1su6+CULD
 SkGh8MALlB3/smeYa+gG5tb5U8l+7Xk62pDWXsRZj+Ckw/FDGql4qve6uSDqAIBz
 6W6PKjHLY81E8nVe/WHcvhQrxE8E9yZg/Hrg4FWLpcLbmJTt709Cm+FciHP8BAsR
 iv3gkBreMRrt9Xlfimn4XCsGaqbXg2Xx8AhaJBshhhjIXvirvB8ctNZvguNX4KFl
 ob+KTO1p26mTFxHLiaJt1fNJzj21XdMrT27FMPqylBF5s1Xr4U9plZHgTX6KMx3o
 BZx1QFTGiskgdKhR01AgzM4ASIWZAUDfpRgABfyWdqHTwqeJyHbcJ+emRpiGCyER
 Og3ar2m75WUA8+Pfgl9TusnNTCiRVYBAcMZGpGEbGKZt+cyCq2Ed161e2I7NPOxR
 c60/j4KHq3uBXh1FhNRX1Y9ZUiK031RqGhBCABeM0bnxImyEo96L3VXJ72RZOvjZ
 1lo9U35q7B6AaFlAesYH4/WaPIExy3RObVHUVtXokzcm4RFh9eycuxPdGc+HDZ04
 h8t6KaAKTtBadIIMWvz34SNykqM4Q0xcHrt8Wz+1C3FZfgc7rkQpVBZLjhk5fx8h
 33KeuMrATAFGvv9d0tbARbIXqXaFGwcc7Z0sSfVnzRfFM/aPa5xnIfGmbxoT5gH8
 v/6ySA3EWA==
 =ZaB3
 -----END PGP SIGNATURE-----

Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Fix regression from this merge window with the xarray partition
   change, which allowed partition counts that overflow the u8 that
   holds the partition number (Ming)

 - Fix zone append warning (Johannes)

 - Segmentation count fix for multipage bvecs (David)

 - Partition scan fix (Chris)

* tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
  block: don't create too many partitions
  block: support zone append bvecs
  block: recalculate segment count for multi-segment discards correctly
  block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
2021-03-28 11:37:42 -07:00
Colin Ian King
2b8ed1c941 io_uring: remove unsued assignment to pointer io
There is an assignment to io that is never read after the assignment,
the assignment is redundant and can be removed.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
78d9d7c2a3 io_uring: don't cancel extra on files match
As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
2482b58ffb io_uring: don't cancel-track common timeouts
Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d595 ("io_uring: reliably cancel linked
timeouts").

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
80c4cbdb5e io_uring: do post-completion chore on t-out cancel
Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
1ee4160c73 io_uring: fix timeout cancel return code
When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Jens Axboe
dbe1bdbb39 io_uring: handle signals for IO threads like a normal thread
We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:07 -06:00
Steve French
cfc63fc812 smb3: fix cached file size problems in duplicate extents (reflink)
There were two problems (one of which could cause data corruption)
that were noticed with duplicate extents (ie reflink)
when debugging why various xfstests were being incorrectly skipped
(e.g. generic/138, generic/140, generic/142). First, we were not
updating the file size locally in the cache when extending a
file due to reflink (it would refresh after actimeo expires)
but xfstest was checking the size immediately which was still
0 so caused the test to be skipped.  Second, we were setting
the target file size (which could shrink the file) in all cases
to the end of the reflinked range rather than only setting the
target file size when reflink would extend the file.

CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:41:55 -05:00
Vincent Whitchurch
219481a8f9 cifs: Silently ignore unknown oplock break handle
Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1.  The debug message which is printed for
these unknown handles may also be misleading, so fix that too.

The SMB2 lease break path is not affected by this patch.

Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".

 CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
 00000000: 424d53fe 00000040 00000000 00000012  .SMB@...........
 00000010: 00000001 00000000 ffffffff ffffffff  ................
 00000020: 00000000 00000000 00000000 00000000  ................
 00000030: 00000000 00000000 00000000 00000000  ................

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:05:26 -05:00
Ronnie Sahlberg
cee8f4f6fc cifs: revalidate mapping when we open files for SMB1 POSIX
RHBZ: 1933527

Under SMB1 + POSIX, if an inode is reused on a server after we have read and
cached a part of a file, when we then open the new file with the
re-cycled inode there is a chance that we may serve the old data out of cache
to the application.
This only happens for SMB1 (deprecated) and when posix are used.
The simplest solution to avoid this race is to force a revalidate
on smb1-posix open.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:04:58 -05:00
Shyam Prasad N
3bffbe9e0b cifs: Fix chmod with modefromsid when an older ACE already exists.
My recent fixes to cifsacl to maintain inherited ACEs had
regressed modefromsid when an older ACL already exists.

Found testing xfstest 495 with modefromsid mount option

Fixes: f506550889 ("cifs: Retain old ACEs when converting between mode bits and ACL")

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:04:35 -05:00
Shyam Prasad N
45a4546c61 cifs: Adjust key sizes and key generation routines for AES256 encryption
For AES256 encryption (GCM and CCM), we need to adjust the size of a few
fields to 32 bytes instead of 16 to accommodate the larger keys.

Also, the L value supplied to the key generator needs to be changed from
to 256 when these algorithms are used.

Keeping the ioctl struct for dumping keys of the same size for now.
Will send out a different patch for that one.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 07:49:39 -05:00
Linus Torvalds
701c09c988 for-5.12-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmBctBgACgkQxWXV+ddt
 WDu1nA//bzuPwW3nO+enE+ipi4t6UJTJpHLeDgdMshWwhBIHVt+oFxTUIt4Zd0kT
 0hJ+mbNrZHzmDmzpb6ifQn0D6k+wq6zbsEgLtwgmPmBszaXIw46FvnYnxd9FtCde
 9SQzBKa86i/KMkRtaIvpUcunniIo5Aj0Hvu0oPgTKObqiB4HP2nV6rKody+mP9JW
 RanWbBi0JvI4UE/J2Ud1sNWFdDtVpXpcktj1dsI8gbsYNR05HpM08SEUgeF/ts3I
 yB/L18I5CUeFHyo/yogbj7kkikugPGsmOj/A86UZ6x3NxWoC+m7UXoGrO2/qlFem
 qd3ioXZKlnPqeX29kAy/REa3xjE61istlDVC/vckqmXBfYc6WK/KAJvFAGI+/3VI
 9HvIbBokUQzekhFlA02RTqGcasStXX7VSeJyzyAbXjGhZQKfFTHR8ZBtrREiVBC9
 58K+g8SSqIb/9iJqYV4h82lSBRSdf9kHx7CSB2gOBuifihY+chVr4Xzhq12IlXbK
 TNlue0BTwYLJStwx2dnY2beLbLG34/4FNRsuAR/9JsCio7Bfj0qN8htIyvfsiMxr
 mkrH7+Ykd10FqC8uu6MHiW9k428871Era3B97TgyQ0V17ehh4IN0v9V7kckk9EWw
 3omaPwuF2FGfFOoTR7ipKO0nDx0/y2knnDSTsWknNG09Ciwa+Ww=
 =SuJv
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Fixes for issues that have some user visibility and are simple enough
  for this time of development cycle:

   - a few fixes for rescue= mount option, adding more checks for
     missing trees

   - fix sleeping in atomic context on qgroup deletion

   - fix subvolume deletion on mount

   - fix build with M= syntax

   - fix checksum mismatch error message for direct io"

* tag 'for-5.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix check_data_csum() error message for direct I/O
  btrfs: fix sleep while in non-sleep context during qgroup removal
  btrfs: fix subvolume/snapshot deletion not triggered on mount
  btrfs: fix build when using M=fs/btrfs
  btrfs: do not initialize dev replace for bad dev root
  btrfs: initialize device::fs_info always
  btrfs: do not initialize dev stats if we have no dev_root
  btrfs: zoned: remove outdated WARN_ON in direct IO
2021-03-25 15:38:22 -07:00
Pavel Begunkov
90b8749022 io_uring: maintain CQE order of a failed link
Arguably we want CQEs of linked requests be in a strict order of
submission as it always was. Now if init of a request fails its CQE may
be posted before all prior linked requests including the head of the
link. Fix it by failing it last.

Fixes: de59bc104c ("io_uring: fail links more in io_submit_sqe()")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7a96b05832e7ab23ad55f84092a2548c4a888b0.1616699075.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-25 13:47:03 -06:00
Linus Torvalds
002322402d Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: mm (hugetlb, kasan, gup,
  selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64,
  gcov, and mailmap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mailmap: update Andrey Konovalov's email address
  mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
  mm: memblock: fix section mismatch warning again
  kfence: make compatible with kmemleak
  gcov: fix clang-11+ support
  ia64: fix format strings for err_inject
  ia64: mca: allocate early mca with GFP_ATOMIC
  squashfs: fix xattr id and id lookup sanity checks
  squashfs: fix inode lookup sanity checks
  z3fold: prevent reclaim/free race for headless pages
  selftests/vm: fix out-of-tree build
  mm/mmu_notifiers: ensure range_end() is paired with range_start()
  kasan: fix per-page tags for non-page_alloc pages
  hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25 11:43:43 -07:00
Bob Peterson
ff132c5f93 gfs2: report "already frozen/thawed" errors
Before this patch, gfs2's freeze function failed to report an error
when the target file system was already frozen as it should (and as
generic vfs function freeze_super does. Similarly, gfs2's thaw function
failed to report an error when trying to thaw a file system that is not
frozen, as vfs function thaw_super does. The errors were checked, but
it always returned a 0 return code.

This patch adds the missing error return codes to gfs2 freeze and thaw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-25 18:53:38 +01:00
Phillip Lougher
8b44ca2b63 squashfs: fix xattr id and id lookup sanity checks
The checks for maximum metadata block size is missing
SQUASHFS_BLOCK_OFFSET (the two byte length count).

Link: https://lkml.kernel.org/r/2069685113.2081245.1614583677427@webmail.123-reg.co.uk
Fixes: f37aa4c736 ("squashfs: add more sanity checks in id lookup")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Sean Nyekjaer <sean@geanix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Sean Nyekjaer
c1b2028315 squashfs: fix inode lookup sanity checks
When mouting a squashfs image created without inode compression it fails
with: "unable to read inode lookup table"

It turns out that the BLOCK_OFFSET is missing when checking the
SQUASHFS_METADATA_SIZE agaist the actual size.

Link: https://lkml.kernel.org/r/20210226092903.1473545-1-sean@geanix.com
Fixes: eabac19e40 ("squashfs: add more sanity checks in inode lookup")
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Acked-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25 09:22:55 -07:00
Jens Axboe
f5d2d23bf0 io-wq: fix race around pending work on teardown
syzbot reports that it's triggering the warning condition on having
pending work on shutdown:

WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
Modules linked in:
CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
 __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
 io_uring_files_cancel include/linux/io_uring.h:47 [inline]
 do_exit+0x258/0x2340 kernel/exit.c:780
 do_group_exit+0x168/0x2d0 kernel/exit.c:922
 get_signal+0x1734/0x1ef0 kernel/signal.c:2773
 arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x465f69

which shouldn't happen, but seems to be possible due to a race on whether
or not the io-wq manager sees a fatal signal first, or whether the io-wq
workers do. If we race with queueing work and then send a fatal signal to
the owning task, and the io-wq worker sees that before the manager sets
IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
behind.

Just turn the WARN_ON_ONCE() into a cancelation condition instead.

Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-25 10:16:12 -06:00
Linus Torvalds
8a9d2e133e cachefiles, afs: mm wait fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmBaVsAACgkQ+7dXa6fL
 C2u/7w/8DU9UZN3IRgZzR47xw3qYlgNMWRoiJ2RwSHYDJcsFqziJ/6jN/MDr7vzc
 eo1XQnDUH1Ok02WNxI6iVIfkX6cC/SidCWs6mNevQ6ksn9ei8tG0ZUWLcUl1IA+O
 HzXxvouyL9aJB+aNTQXttoi8JaSuoW/HBV3MbjOLywsy41AicCpt0gI0AJgXHKe8
 nEz3mqWZpCywRTkVkt9sWFOMX2shUzy8SoFgLMNpDUgyMD4r98XVJdIH8X4Em3zE
 syLg92aOnxxTEOAAYefcOSsgDBIkxLqW6F/K884cTPgLC24RJ/LO+M4GoOWX1Cmj
 Gqy9DZ3TGTu9yXr6Cm32OMl6t1Y0rYnktNl1Z4OT0XibK4gxgohZEr811A1/pHHu
 OfPBIUAotKRS4o/scs8Au0+XMT0/R7qfsGZe+TUGzWG1CRzf+tOLMrgXPxWnh2fV
 E2eNfOzy2Ry5v0XB4Lb4tb0JVPM2WOBTbswgUIHUOLz7fT6+mVaFYK/8eDDu6EJH
 zmDxs7HLZvI6X6XB2DOCDDWJbzKk9Jo27raGV5o6QCwAKENIr8XAvgZBEg5+Quvc
 feNBNSWTplgB5ROPlRWgmy/Xh4Y4+uRMCzMN+q9FtC810bDCE5rY5TRnayxmx9ni
 XugpJnoMBM8QcbtHNxropGOg+gQpABYfSfZMmcNPd+Oyix3SbtQ=
 =/IaF
 -----END PGP SIGNATURE-----

Merge tag 'afs-cachefiles-fixes-20210323' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull cachefiles and afs fixes from David Howells:
 "Fixes from Matthew Wilcox for page waiting-related issues in
  cachefiles and afs as extracted from his folio series[1]:

   - In cachefiles, remove the use of the wait_bit_key struct to access
     something that's actually in wait_page_key format. The proper
     struct is now available in the header, so that should be used
     instead.

   - Add a proper wait function for waiting killably on the page
     writeback flag. This includes a recent bugfix[2] that's not in the
     afs code.

   - In afs, use the function added in (2) rather than using
     wait_on_page_bit_killable() which doesn't provide the
     aforementioned bugfix"

Link: https://lore.kernel.org/r/20210320054104.1300774-1-willy@infradead.org[1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [2]
Link: https://lore.kernel.org/r/20210323120829.GC1719932@casper.infradead.org/ # v1

* tag 'afs-cachefiles-fixes-20210323' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Use wait_on_page_writeback_killable
  mm/writeback: Add wait_on_page_writeback_killable
  fs/cachefiles: Remove wait_bit_key layout dependency
2021-03-24 10:22:00 -07:00
Christian Brauner
bf1c82a538 cachefiles: do not yet allow on idmapped mounts
Based on discussions (e.g. in [1]) my understanding of cachefiles and
the cachefiles userspace daemon is that it creates a cache on a local
filesystem (e.g. ext4, xfs etc.) for a network filesystem. The way this
is done is by writing "bind" to /dev/cachefiles and pointing it to the
directory to use as the cache.

Currently this directory can technically also be an idmapped mount but
cachefiles aren't yet fully aware of such mounts and thus don't take the
idmapping into account when creating cache entries. This could leave
users confused as the ownership of the files wouldn't match to what they
expressed in the idmapping. Block cache files on idmapped mounts until
the fscache rework is done and we have ported it to support idmapped
mounts.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/lkml/20210303161528.n3jzg66ou2wa43qb@wittgenstein [1]
Link: https://lore.kernel.org/r/20210316112257.2974212-1-christian.brauner@ubuntu.com/ # v1
Link: https://listman.redhat.com/archives/linux-cachefs/2021-March/msg00044.html # v2
Link: https://lore.kernel.org/r/20210319114146.410329-1-christian.brauner@ubuntu.com/ # v3
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-24 10:20:22 -07:00
Pavel Begunkov
a185f1db59 io_uring: do ctx sqd ejection in a clear context
WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
Call Trace:
 io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
 io_uring_release+0x3e/0x50 fs/io_uring.c:8646
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 io_run_task_work fs/io_uring.c:2238 [inline]
 io_run_task_work fs/io_uring.c:2228 [inline]
 io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
fput callback (i.e. io_uring_release()) is enqueued through task_work,
and run by same cancellation. As it's deeply nested we can't do parking
or taking sqd->lock there, because its state is unclear. So avoid
ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
in a clear context in io_ring_exit_work().

Fixes: f6d54255f4 ("io_uring: halt SQO submission on ctx exit")
Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-24 06:55:11 -06:00
Matthew Wilcox (Oracle)
75b6979961 afs: Use wait_on_page_writeback_killable
Open-coding this function meant it missed out on the recent bugfix
for waiters being woken by a delayed wake event from a previous
instantiation of the page[1].

[DH: Changed the patch to use vmf->page rather than variable page which
 doesn't exist yet upstream]

Fixes: 1cf7a1518a ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-afs@lists.infradead.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-4-willy@infradead.org
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [1]
2021-03-23 20:54:37 +00:00
Matthew Wilcox (Oracle)
39f985c8f6 fs/cachefiles: Remove wait_bit_key layout dependency
Cachefiles was relying on wait_page_key and wait_bit_key being the
same layout, which is fragile.  Now that wait_page_key is exposed in
the pagemap.h header, we can remove that fragility

A comment on the need to maintain structure layout equivalence was added by
Linus[1] and that is no longer applicable.

Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-cachefs@redhat.com
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-2-willy@infradead.org/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2 [1]
2021-03-23 20:54:29 +00:00
Chris Chiu
5116784039 block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
The GD_NEED_PART_SCAN is set by bdev_check_media_change to initiate
a partition scan while removing a block device. It should be cleared
after blk_drop_paritions because blk_drop_paritions could return
-EBUSY and then the consequence __blkdev_get has no chance to do
delete_partition if GD_NEED_PART_SCAN already cleared.

It causes some problems on some card readers. Ex. Realtek card
reader 0bda:0328 and 0bda:0158. The device node of the partition
will not disappear after the memory card removed. Thus the user
applications can not update the device mapping correctly.

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1920874
Signed-off-by: Chris Chiu <chris.chiu@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323085219.24428-1-chris.chiu@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-23 09:58:34 -06:00
Pavel Begunkov
d81269fecb io_uring: fix provide_buffers sign extension
io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension
problems. Not a huge problem as it's only used for access_ok() and
increases the checked length, but better to keep typing right.

Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: efe68c1ca8 ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-22 07:41:03 -06:00