Commit Graph

75694 Commits

Author SHA1 Message Date
Linus Torvalds 57ae8a4921 Driver core fixes for 5.18-rc5
Here are some small driver core and kernfs fixes for some reported
 problems.  They include:
 	- kernfs regression that is causing oopses in 5.17 and newer
 	  releases
 	- topology sysfs fixes for a few small reported problems.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYm1QrQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykJQACgj3QhUJxgKSQ6Rri+ODHg4KgDSZsAoIuD3rjq
 5zRFYAcmogYgmN50HNVa
 =2LQM
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core and kernfs fixes for some reported
  problems. They include:

   - kernfs regression that is causing oopses in 5.17 and newer releases

   - topology sysfs fixes for a few small reported problems.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: fix NULL dereferencing in kernfs_remove
  topology: Fix up build warning in topology_is_visible()
  arch_topology: Do not set llc_sibling if llc_id is invalid
  topology: make core_mask include at least cluster_siblings
  topology/sysfs: Hide PPIN on systems that do not support it.
2022-04-30 10:24:21 -07:00
Linus Torvalds 63b7b3ea94 io_uring-5.18-2022-04-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJsLXsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm6xD/4rcDWLQSB9zZB55NmHb1IreOIpibIlGnGJ
 V1IwYCIilKvZuFlkxcD7INDl0JZTvpxWl4fn2ObgPe+PtIc/gdneX3NtnnjwEPdr
 SeDCjhYNoHcB3CoQCg0jjuqLygL0+oVXDer6bpxYSK1U3lLkKCmHtfi+GW3OtpZF
 pVxY4UYNBrMxs9UVhsF3mgd9QEFn2QwmMBYjg4DmsNZy9drfGC07twd4eCNIJIY0
 m+2Y1u0kjstgGxmwYhxbAw6WWkqt+kHU/zbzYXE2pBATABQxECnIw9mfeUrnyu3S
 kUwSAHhcm4qURUUlXj3u6fpDF8EoZo1GpsMo81TqnLpgaWmP80fz24R902f9ycIO
 qQ4xY6SDYZ4rgx1ISoUbyJrIi1dqLaRGUI0KKdQcLw4ZBL2ngXBhkLhpOev6r+T0
 Hx20B9H7IOMBYqAEE9O5VHrJIEDS/xGKlkBbFQFEQ1v+e4TN8aiTGD/0SsU1imeR
 k4T9XD9O7K62iftGGN+2Cz3V2Ag+nbN79B48JlECGEE2zorzcLP5OboLWvG+vRMC
 DgQELPgAW8Zo+VS5EyqosxqO5+8H+wfSnfWYg1hGJRyWyYRnaf9VfgJNBL+rszCq
 g+rJ9NpNZeJVjtsJ/M0jw2lG965CQpZh4zjy0lD04K1bKRynUEUd72ZMUrXZrCb5
 kybLDKoPsg==
 =/R/B
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Pretty boring:

   - three patches just adding reserved field checks (me, Eugene)

   - Fixing a potential regression with IOPOLL caused by a block change
     (Joseph)"

Boring is good.

* tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block:
  io_uring: check that data field is 0 in ringfd unregister
  io_uring: fix uninitialized field in rw io_kiocb
  io_uring: check reserved fields for recv/recvmsg
  io_uring: check reserved fields for send/sendmsg
2022-04-29 14:51:57 -07:00
Linus Torvalds bd383b8e32 A fix for a NULL dereference that turns out to be easily triggerable
by fsync (marked for stable) and a false positive WARN and snap_rwsem
 locking fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmJsELYTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziwW9CACcunarIMNtKWRRoQjOh/2RUbqEqZaA
 amz5mb6BIkGiZ092UggQ+5SKRJ0eIWayCatMZ5PKpvAMUGpOBgPjQsG1WvqzFzd5
 m84FQ16CsywcD1AYAUlArq9Y59VFQyBXh3kovwDCEywh9F9FPgpDC0MrjeHsBQ0z
 MtsuhzBoLxyVwANV7WFOH2/+U+EPfkK8pNDKluJDy2P6QavsJAI8lk4oEMFgVTPl
 avLdeSC6EIJ8ZwFs//PgGsmjHPLdgA8cEMJEWxa7Sw0zy7+CZpOTuUn95KERIDrc
 7XKc6QdvNdcGSs2boQSFUrfpNV6NHjB7xb0b9fMAqFan9Vb9TFdv2B6x
 =OEJo
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client

Pull ceph client fixes from Ilya Dryomov:
 "A fix for a NULL dereference that turns out to be easily triggerable
  by fsync (marked for stable) and a false positive WARN and snap_rwsem
  locking fixups"

* tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client:
  ceph: fix possible NULL pointer dereference for req->r_session
  ceph: remove incorrect session state check
  ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap
  libceph: disambiguate cluster/pool full log message
2022-04-29 14:37:35 -07:00
Eugene Syromiatnikov 303cc749c8 io_uring: check that data field is 0 in ringfd unregister
Only allow data field to be 0 in struct io_uring_rsrc_update user
arguments to allow for future possible usage.

Fixes: e7a6c00dc7 ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-29 08:39:43 -06:00
Joseph Ravichandran 32452a3eb8 io_uring: fix uninitialized field in rw io_kiocb
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.

Fixes: 3e08773c38 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-28 13:13:43 -06:00
Linus Torvalds 4a2316a1ed gfs2 fix
- No short reads or writes upon glock contention
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJqk4IUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpAYA/9GIlIzJWlgvc/H8sKef/kfpRT3G8p
 9oPQFG2QA9HtvoDWBY8Gssnm/oiFpMGciGLpMbpQTXLnIXbP/2jEK4Z2gNXbecJe
 zmaBeqaw3nO72j+/AdW19gBi4WRxeseGMXVoUbqKz/xO1ftaQQrXuUFtHotU0n6F
 xGatgoLg2YRoxnUshN23uvPIZ22DAjBpsl/RVhWSw5SM1ngyjxQ4w++by1UC4JZr
 8f4mlGaWTZCnWfKJwrgmJG/z9jSvp7GwzDRFUDCAg+XFDD+89sf4GwM7vFoVF9Fz
 5/6Sy2I5Pd4vZupHIV6YLu6TvgQFAYvX2BD74mlfrxV3roXeZyZvEhGfeDvMPjWZ
 M0xv9sl3XTGn8F3FEoz9q7axp9IqSRQU3Rgco20csWpEMKE82p13cadXlD3tM3Y1
 K/m40r1T+UT0Zm7OJH2eYQfBVqz+Qyl6IVSqOohIRNz8IuYNv3JgW/HKuZpMKToW
 CGaaHtPiLUX1dF3UIeWrKbvpvnWLB/pDc6o8C53kSsQ7kq6TcFrTbI2+8JcQBBlr
 JaxjD8TKyeSzuV2IaD/XyxpcZgMYxFvi9iRppNsN97O8xOWeG8xbt41AC35sAS7Y
 oaAZnZUgoEcnfK7Qn7CDmqU9N1txzl11WgXZTluJ4sU9Sc/WWpVtWF3pVw5hHxfM
 QeqlCVPTwf6GJsw=
 =Iibp
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - No short reads or writes upon glock contention

* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: No short reads or writes upon glock contention
2022-04-28 09:50:29 -07:00
Linus Torvalds 8061e16e20 xfs: fixes for v5.18-rc5
XFS changes for v5.18-rc5:
 - define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
 - remove redundant XFS fields from MAINTAINERS
 - fix inode buffer locking order regression
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmJnJ0IUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h0UFw//bdCSgHHT8uKEnaad3BJv3H50Alkc
 o7W+NvEyPij4FQVbytphl/1TN6XodIt8dBl+NkYneve7emyVVzB9MdDL2VWSl9nr
 JstswE3hY/v1n2LsRmErmT0jrnmTQbIAvCv6jllnlbR1eqlcK5XVyYAipMi69t3E
 GLREaiq4dpbTqAJ3sBcIs5eJ6O84euiXlIrOolnhFln/B3+jzgexgzzhdHZyJ9+9
 vp7xY0jWdPbPm8QeWNl7Qpebch2qe4gFXXnSTVvYsZNyvL8+FDmYQi6RyAHU15ML
 I26Ub87hV1UhYcVSsMSyfAOQTR2yElIjmBNiKwPnfwOXIdKjvYI72x71YRNwFDS8
 RFyWE8n7z7kpgqvWDuV5o+5pg/7578xfoRO6wXtvidcDAFZiM3OhHS9jz8Tbasx8
 /R+jlB8JM10PKHUiMpdtNQGZU1XkHNPT6eZ7hiuvYZm06SudxKr1yH5d7T5Okq4C
 ciK782LRpKofwgUBX748O5uomYLaBvanTpErW8HP20hb/hRrwx8gXz29O1/ZaCZe
 2muPlAniPdvSq2eTz7wyEFlIjIiswRa8tOt4EjLFNLimIA5Ek0GtoToxcjs9xmr1
 GP4B+5wymjktn/9/F1EwKBkD9WTR4eJn2RbLqm5fdr/5MXS/xfixF6dYl1X5QtF3
 5zRl52y4gU3dPI4=
 =VfQp
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Dave Chinner:

 - define buffer bit flags as unsigned to fix gcc-5 + c11 warnings

 - remove redundant XFS fields from MAINTAINERS

 - fix inode buffer locking order regression

* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: reorder iunlink remove operation in xfs_ifree
  MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
  xfs: convert buffer flags to unsigned.
2022-04-28 09:37:56 -07:00
Andreas Gruenbacher 296abc0d91 gfs2: No short reads or writes upon glock contention
Commit 00bfe02f47 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers.  When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.

As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.

Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes: 00bfe02f47 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-04-28 15:14:48 +02:00
Minchan Kim ad8d869343 kernfs: fix NULL dereferencing in kernfs_remove
kernfs_remove supported NULL kernfs_node param to bail out but revent
per-fs lock change introduced regression that dereferencing the
param without NULL check so kernel goes crash.

This patch checks the NULL kernfs_node in kernfs_remove and if so,
just return.

Quote from bug report by Jirka

```
The bug is triggered by running NAS Parallel benchmark suite on
SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
log:

[  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  247.036009] #PF: supervisor read access in kernel mode
[  247.036009] #PF: error_code(0x0000) - not-present page
[  247.036009] PGD 0 P4D 0
[  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
[  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
[  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
2.0b 03/07/2018
[  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
[  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
c4 60
[  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
[  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
[  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
[  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
[  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
[  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
[  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
knlGS:0000000000000000
[  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
[  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  247.122048] PKRU: 55555554
[  247.122048] Call Trace:
[  247.122048]  <TASK>
[  247.122048]  rdt_kill_sb+0x29d/0x350
[  247.122048]  deactivate_locked_super+0x36/0xa0
[  247.122048]  cleanup_mnt+0x131/0x190
[  247.122048]  task_work_run+0x5c/0x90
[  247.122048]  exit_to_user_mode_prepare+0x229/0x230
[  247.122048]  syscall_exit_to_user_mode+0x18/0x40
[  247.122048]  do_syscall_64+0x48/0x90
[  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  247.122048] RIP: 0033:0x7f01be2d735b
```

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
Link: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/
Fixes: 393c371408 (kernfs: switch global kernfs_rwsem lock to per-fs lock)
Cc: stable@vger.kernel.org
Reported-by: Jirka Hladky <jhladky@redhat.com>
Tested-by: Jirka Hladky <jhladky@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20220427172152.3505364-1-minchan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27 19:32:07 +02:00
Linus Torvalds 211ed5480a zonefs fixes for 5.18-rc5
Two fixes for rc5:
 
 * Fix inode initialization to make sure that the inode flags are all
   cleared.
 
 * Use zone reset operation instead of close to make sure that the zone
   of an empty sequential file in never in an active state after closing
   the file.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYmjSewAKCRDdoc3SxdoY
 dqd+APkBb/phIqK21E1x1KIXe51WTeL4mhkc0dwkQ6vRGcZOfAEA6cW++S7X1Sqo
 JqORCus5FZEfs59NhI6TBD6BtsIZ9wc=
 =hCgn
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:
 "Two fixes for rc5:

   - Fix inode initialization to make sure that the inode flags are all
     cleared.

   - Use zone reset operation instead of close to make sure that the
     zone of an empty sequential file in never in an active state after
     closing the file"

* tag 'zonefs-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix management of open zones
  zonefs: Clear inode information flags on inode creation
2022-04-27 10:30:29 -07:00
Jens Axboe 5a1e99b61b io_uring: check reserved fields for recv/recvmsg
We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.

Fixes: aa1fa28fc7 ("io_uring: add support for recvmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-26 20:48:37 -06:00
Jens Axboe 588faa1ea5 io_uring: check reserved fields for send/sendmsg
We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.

Fixes: 0fa03c624d ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-26 20:48:31 -06:00
Linus Torvalds 4fad37d595 gfs2 fix
- Only re-check for direct I/O writes past the end of the file after
   re-acquiring the inode glock.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJn/TcUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqtQg/+KFuiVv0FW8R/Pq6/zoSolOPmP6eC
 P2I2hGFTtUkgnyEtdjunIvpMiDVxvNhFPSJgrkfwx6pf55R67K9LJlHUiv8eRJpc
 GS5YxKEClnTZt7RTS8N/dHFrpIATqjhlPL8k3E7WE/uY72Nkt6yfStljnBU5KYXw
 NmhNWiCutn/nnvhzlfAc5GugmQ6umpaKeo59ClEd4SU+MdqjFSobl8SoUAOasXaK
 4rNAh36R2wwS512Rs/y0+UAu5jNn7kcFvDbOBdJuAeXj26JDMYF2pp/ybp/LTqEr
 vdPO814mRwtk9SP56FZZVHzG90aVlbuX5Dxl8xPlK8Df3J0WvYooKBWIXv6xmBj0
 sYmqQPMGe3ydWUCQRbFBVSw4qzMnxBXZWGDFuCbMaDDZ75+Uidl+H67pOxdqS8X2
 rE4cqKkI02hGz8OEo7YXq0esBxrUWxvKwJcR1G0pIqYCkp77J3S4JeACtplQXnJY
 HUcmpiVOUAvhB7QQgV5DFuwkX5jKPSDd62KHR492+233ONJanGNoReQagiZUsGDp
 fpPkGQ5Rt66Sj+BTLg7yOUPpMvcsYqeIOmFxn1GnKudq0jVkDX4W9YlfSDfnnLO8
 X+1EuYGNGe1jx5luLUKs1XvPYrVSc9yDy3DTbtwrvCuimxNeCPTqOoMU4di3UUgP
 145LcO2Q+M+E+t8=
 =mqGR
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.18-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Only re-check for direct I/O writes past the end of the file after
   re-acquiring the inode glock.

* tag 'gfs2-v5.18-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Don't re-check for write past EOF unnecessarily
2022-04-26 11:17:18 -07:00
Linus Torvalds fd574a2f84 for-5.18-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmJnGGIACgkQxWXV+ddt
 WDumDw//cE1NcawdnVkEaKr20PetHfzPyFSIIr17nedtnVvWYyOFF/0uJlHNhv8Z
 CZIfJ7fmH/pO5oWPXN84wKNfumDWNwc36QrvoXC67TrKUSiBN8BzL83HvAjGwYFH
 G+LfZXGnVbqq8F1iYkIsuH0Oo1x/N/LPM3s6iZy3O4l8s96u+J4GRnc8Tr0AH4MA
 zgz3fab8Ec378HTG9fvdAQNLxFEe0VatD6WrzILnmM8UgeQK7g73dqH9Ni9gz2DW
 2GDlO6aevQ1G6dm2AJ0ItExnbHH7TfOThkG56Gdqrzb/d39GzrVpeob7QiorETus
 EWS1rXaeikUiD4Bzt/RszUNL80yMN1DjcN3QBkiDf3ShSDFteoHMPw3e6jcQCy1m
 Dxf5oditQqltuFNLeSiVbZEMw2kXqBP7RoPiirF9rdvrDNLHhAE9wu0kpSGSSvT7
 Tyu9JyLw2axU6wGTi1GHAXurlW2ItRRyFAewWWul1lLkuz/6YXI4F/EHm3Mbh6Nh
 pMIFMNr4Oafdx+3Ful8ZA4PynirXub/xVDefcFBibz/PTGEnHG4ZVzRudmVnowh7
 GP2pql1+Y/TFkXdD98V8GWD+E10JAmNCkQSoiggJooNWR28whukmDVX/HY8lGmWg
 DjxwGkte3SltUBWNOTGnO7546hMwOxOPZHENPh+gffYkeMeIxPI=
 =xDWz
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - direct IO fixes:

      - restore passing file offset to correctly calculate checksums
        when repairing on read and bio split happens

      - use correct bio when sumitting IO on zoned filesystem

 - zoned mode fixes:

      - fix selection of device to correctly calculate device
        capabilities when allocating a new bio

      - use a dedicated lock for exclusion during relocation

      - fix leaked plug after failure syncing log

 - fix assertion during scrub and relocation

* tag 'for-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: use dedicated lock for data relocation
  btrfs: fix assertion failure during scrub due to block group reallocation
  btrfs: fix direct I/O writes for split bios on zoned devices
  btrfs: fix direct I/O read repair for split bios
  btrfs: fix and document the zoned device choice in alloc_new_bio
  btrfs: fix leaked plug after failure syncing log on zoned filesystems
2022-04-26 11:10:42 -07:00
Andreas Gruenbacher e57f9af73d gfs2: Don't re-check for write past EOF unnecessarily
Only re-check for direct I/O writes past the end of the file after
re-acquiring the inode glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-04-26 15:38:00 +02:00
Linus Torvalds d615b5416f f2fs-fix-5.18
This includes major bug fixes introduced in 5.18-rc1 and 5.17+.
 
 - Remove obsolete whint_mode (5.18-rc1)
 - Fix IO split issue caused by op_flags change in f2fs (5.18-rc1)
 - Fix a wrong condition check to detect IO failure loop (5.18-rc1)
 - Fix wrong data truncation during roll-forward (5.17+)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmJmzzYACgkQQBSofoJI
 UNKuLw/+M3J/WCNH8Zuysgt3eJ6K9e2jtT/xfj+p/s52lileP4ljtnWJ7x8YaViX
 sMbzLkDL4tPip2SuNEQifW2EmMCJ6BCCWV5+SIK5/rEiwRYuhFygVqQIHn2S8g5O
 gWixusQ/NYaUN6e3Bi/7WjUhWmeV4oyhv6WB+sS3f0zaNs9RGoe/K3yNGeLLWm+/
 0Pz6P2LAARs/N/iSw03KWv+1BzMJxbLC1w38boPbysv5oT59+gtxKGVSxie1cce8
 F5mMt1Z7JplPUKhbjrqYo9LzNzAFIgIt3P6mbclE7ASKi9UYOtiT3nvikj6lygbM
 i9FHIcP6bqtjU7GZ5vVbYDW43pGZN+6Hlz7Fu1I3ix4Z2eyFWc/W1Fl6OmjvjVpj
 t/iafwvvdqm1NChLkJx3EXquDDuhxvKhbuuaTwLpuNt+56OvFJ0e91kZhWwbB3dY
 7y80N+VgB0MvFStWeZD85lMvSYfXmv5dnjCu6+nAxRzlsx+JVs8STN6+KHSiVeoY
 LbtyR1sViO0UGNVZAd8XLs8CIScfxatx059ui0wW+Bh2JOy2p5RW0vUKxFzm9/ZJ
 OtRT2W0fdyutYpfwERxny706cV3wOOOP20a/2NC4HgUVWLYNO6hnLeDK7+WlOMqP
 XHWKpkRzwLtZQKVyFNDtFHoYgQKCltQNNp6t/qJnSGC1Q1Cchfo=
 =nBQL
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:
 "This includes major bug fixes introduced in 5.18-rc1 and 5.17+:

   - Remove obsolete whint_mode (5.18-rc1)

   - Fix IO split issue caused by op_flags change in f2fs (5.18-rc1)

   - Fix a wrong condition check to detect IO failure loop (5.18-rc1)

   - Fix wrong data truncation during roll-forward (5.17+)"

* tag 'f2fs-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: should not truncate blocks during roll-forward recovery
  f2fs: fix wrong condition check when failing metapage read
  f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders
  f2fs: remove obsolete whint_mode
2022-04-25 10:53:56 -07:00
Xiubo Li 7acae6183c ceph: fix possible NULL pointer dereference for req->r_session
The request will be inserted into the ci->i_unsafe_dirops before
assigning the req->r_session, so it's possible that we will hit
NULL pointer dereference bug here.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55327
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-04-25 10:45:50 +02:00
Xiubo Li 396ea16818 ceph: remove incorrect session state check
Once the session is opened the s->s_ttl will be set, and when receiving
a new mdsmap and the MDS map is changed, it will be possibly will close
some sessions and open new ones. And then some sessions will be in
CLOSING state evening without unmounting.

URL: https://tracker.ceph.com/issues/54979
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-04-25 10:45:15 +02:00
Niels Dossche 7f47f7f3b3 ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap
ceph_add_cap says in its function documentation that the caller should
hold the read lock on the session snap_rwsem. Furthermore, not only
ceph_add_cap needs that lock, when it calls to ceph_lookup_snap_realm it
eventually calls ceph_get_snap_realm which states via lockdep that
snap_rwsem needs to be held. handle_cap_export calls ceph_add_cap
without that mdsc->snap_rwsem held. Thus, since ceph_get_snap_realm
and ceph_add_cap both need the lock, the common place to acquire that
lock is inside handle_cap_export.

Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-04-25 10:45:15 +02:00
Linus Torvalds 22da5264ab 3 fixes to ksmbd server
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJkGaQACgkQiiy9cAdy
 T1G7Pgv/SHmxsBGIT+YyW+ppWZGIMMpCld3YbovtN2EWdt3NXwvnGrTTFkXe0acJ
 +63NjNnCoDTKy+XHN0HhMqLsJSep6J63e5rmiuvUxKQIR0n1WzZl7W1YpHKptZog
 wMkObv0UzKgr6gLEwQeMMqvcyxOVmgssAXou4N8p6rDJLFijM2kcVjpB/B9uyUAR
 JG0ss8lhX7+YTcRuI0QqyulHlUTiGwu4/XjO9oWs3bF2faADVIfZXOKGMFfRpjCK
 YDGdh+HieW5y2SngvstdBuVxmZXLjWjWwe9mQCUaF7khZJ0acuGeQh9BCPNopjUD
 0CBYe9JWM2NxTjXqmhspCGUi40+EedZgIMxukRyl7MrBX1wBF0ErSYpGmYSZQqoH
 u2R9Pr8tUwuzVfQ6s7VhWACckKFNwI53lHOhY1ikc2M/NWeLy961Wi9JRykac8cZ
 ElkJkJUdGttntjwilcKp7NuWmHssDzxNH103WAr3GZrhBDgntUHFBpsLqzQmQW3a
 Gp3jkj1e
 =Y9T8
 -----END PGP SIGNATURE-----

Merge tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:

 - cap maximum sector size reported to avoid mount problems

 - reference count fix

 - fix filename rename race

* tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
  ksmbd: increment reference count of parent fp
  ksmbd: remove filename in ksmbd_file
2022-04-23 17:16:10 -07:00
Linus Torvalds 1f5e98e723 io_uring-5.18-2022-04-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJjYJUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsZZEADS/dD7pZKxBLcHTCGJAik1/IIv/3ynTrOp
 o86uV0AH+nL6lUyBU7uTTQVFn9Hjh6T10ZfRmcU+1Xb8G4obHTrQJkk5evwCNPng
 9CUW2fwQa+6H4Ui8TU7f1rLcLlm+AUSVmab6h/20X5ldMwzF1JhcE11qtqZw0ti7
 mDPwmxEsx7KMvMy59awA+5IpnXHxe5SvHXuzLsMNwux6dH7VauxE8R+Y+HHVzLWc
 fM2dbEU2Hq5nL23DedMw3ZaHwhQTiWdOQA0386iDB6cJdFv19iw+ApD4KS/qAT2X
 URQ3pmyNOXvOsBosVL4za7VVCoUlA23ZSMoU82p2K3NK4NGfV7S4oIeO5ZR2BK/C
 bIC4c2gutIbYrtdSITBW4z2tj+26BBZS7LaT3Bek/3BL+GjQuM6vK8N4ZhRXPC+l
 vWAwXUnWSyXR4+HWpvm3ewlrSY5CQjfsZgU1PIYybhTf/oo2BxX/HQfRk3XKfLIR
 89gvITTUrC8B4dgPgLs/MF4Ercmoa2//2yL0onBEwdC2b1lRqD/bGM2FAYolzLzf
 W1+BrFj3sRUjexdO7ChrtZvAWo59REAxXdP/3h+NbkIz8sunG2Vpf9NX3cXj30n3
 bZL3SkEbtFxpnXspRSQRnmL6DMLPMa7UC+MxpNBAV0g6aMmuSdxqiIZR10+kIEyR
 PJyRqbRefg==
 =Huq7
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Just two small fixes - one fixing a potential leak for the iovec for
  larger requests added in this cycle, and one fixing a theoretical leak
  with CQE_SKIP and IOPOLL"

* tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
  io_uring: fix leaks on IOPOLL and CQE_SKIP
  io_uring: free iovec if file assignment fails
2022-04-23 09:42:13 -07:00
Linus Torvalds c00c5e1d15 Fix some syzbot-detected bugs, as well as other bugs found by I/O
injection testing.  Change ext4's fallocate to update consistently
 drop set[ug]id bits when an fallocate operation might possibly change
 the user-visible contents of a file.  Also, improve handling of
 potentially invalid values in the the s_overhead_cluster superblock
 field to avoid ext4 returning a negative number of free blocks.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmJinf8ACgkQ8vlZVpUN
 gaOHgQf+MKgUZgteYogLzoP3mF1kSycOGawk4wZ3QHOLz7AvsV2p9J8BWihbS/EK
 dBydfXbTMvCUrjWmpqb5dHECRzxdfxOJ0SPJtibc8DZaJc9ImNFmgSp9kyJ3uRaN
 cPGO6Lz2RXpdumVMPPLwzUJdVyrLi0K6I1NYSocxKgribePzd+xil8S9zRZj8Bpe
 RaeH0EytcRj2CI5qs5mI/mOPBAMsZeczd3HInI3gyCgP2I4ZOfsADne3APx57mcI
 IGKf77nvIwMHeKel3MGYfFPitEs5cZpHUhHplCMtgFsO8H0IR93tqnlaCvTM7VAZ
 Slamgl7pfcXFcLZP+pm0QL/82ub7iw==
 =FIds
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix some syzbot-detected bugs, as well as other bugs found by I/O
  injection testing.

  Change ext4's fallocate to consistently drop set[ug]id bits when an
  fallocate operation might possibly change the user-visible contents of
  a file.

  Also, improve handling of potentially invalid values in the the
  s_overhead_cluster superblock field to avoid ext4 returning a negative
  number of free blocks"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix a potential race while discarding reserved buffers after an abort
  ext4: update the cached overhead value in the superblock
  ext4: force overhead calculation if the s_overhead_cluster makes no sense
  ext4: fix overhead calculation to account for the reserved gdt blocks
  ext4, doc: fix incorrect h_reserved size
  ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
  ext4: fix use-after-free in ext4_search_dir
  ext4: fix bug_on in start_this_handle during umount filesystem
  ext4: fix symlink file size not match to file content
  ext4: fix fallocate to use file_modified to update permissions consistently
2022-04-22 18:18:27 -07:00
Linus Torvalds 88c5060d56 4 fixes to cifs client, 2 for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJi1i4ACgkQiiy9cAdy
 T1G9qgv/fY5M1iyz7ddwSbFeX2MPL3225C/LuWf1cW/deUuf83/RyjefaI/8GSjG
 DT+0mvpow/vvOWWxS+ZR+O3vp32Q/twO4NhgTU5FQicACU1AmofcY/klVUCm1W9f
 oS6q/dlvWaqlqDLajprPOEuh5ZG01GlkmF0lTvo00ze40PlDfpXZEpmoCHtNPUx9
 VUFvq+Bxh+3kvWaHk7YKN0ENilSZ9UYy1KIClpQXlXU4rRwT9MORiljwXB+FYw9E
 suKsAyUk7ek01TBFUjcRC7OYjlq2t7wcvsOlLLwvMlisgauJZgPHAv1igJF04fVw
 jMIE9RwH5EuPHP5JrVj+w3mkk9XXG4xbXREfmNCC4V/V1a3ZefU3r0lj1VBV4HIi
 p2FpVvEBU+DkFuO8pwA2xv56ykCXooAftk1xX9Rj27ICEL/OPqiXbEdb46wTT+Nj
 cf8L0qumOvV4/IMRre94RJCcwn4IfTW0O4VGCAhk3U+qR1MZzmGtpvjzMVdUOad+
 C7jaDuwd
 =XdOO
 -----END PGP SIGNATURE-----

Merge tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Four fixes, two of them for stable:

   - fcollapse fix

   - reconnect lock fix

   - DFS oops fix

   - minor cleanup patch"

* tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: destage any unwritten data to the server before calling copychunk_write
  cifs: use correct lock type in cifs_reconnect()
  cifs: fix NULL ptr dereference in refresh_mounts()
  cifs: Use kzalloc instead of kmalloc/memset
2022-04-22 13:26:11 -07:00
Linus Torvalds 279b83c673 fs.fixes.v5.18-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYmF+/wAKCRCRxhvAZXjc
 otmDAP47jPBjTS+gdnMy8fP6ymZu2+So3gex8N777x23mTvQ5AEAy9s4tKMb5UA5
 rwQa8vRgkUBAyiB9yjloNOdN65X1cAE=
 =WLsw
 -----END PGP SIGNATURE-----

Merge tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull mount_setattr fix from Christian Brauner:
 "The recent cleanup in e257039f0f ("mount_setattr(): clean the
  control flow and calling conventions") switched the mount attribute
  codepaths from do-while to for loops as they are more idiomatic when
  walking mounts.

  However, we did originally choose do-while constructs because if we
  request a mount or mount tree to be made read-only we need to hold
  writers in the following way: The mount attribute code will grab
  lock_mount_hash() and then call mnt_hold_writers() which will
  _unconditionally_ set MNT_WRITE_HOLD on the mount.

  Any callers that need write access have to call mnt_want_write(). They
  will immediately see that MNT_WRITE_HOLD is set on the mount and the
  caller will then either spin (on non-preempt-rt) or wait on
  lock_mount_hash() (on preempt-rt).

  The fact that MNT_WRITE_HOLD is set unconditionally means that once
  mnt_hold_writers() returns we need to _always_ pair it with
  mnt_unhold_writers() in both the failure and success paths.

  The do-while constructs did take care of this. But Al's change to a
  for loop in the failure path stops on the first mount we failed to
  change mount attributes _without_ going into the loop to call
  mnt_unhold_writers().

  This in turn means that once we failed to make a mount read-only via
  mount_setattr() - i.e. there are already writers on that mount - we
  will block any writers indefinitely. Fix this by ensuring that the for
  loop always unsets MNT_WRITE_HOLD including the first mount we failed
  to change to read-only. Also sprinkle a few comments into the cleanup
  code to remind people about what is happening including myself. After
  all, I didn't catch it during review.

  This is only relevant on mainline and was reported by syzbot. Details
  about the syzbot reports are all in the commit message"

* tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: unset MNT_WRITE_HOLD on failure
2022-04-22 13:17:19 -07:00
Christophe Leroy 5f24d5a579 mm, hugetlb: allow for "high" userspace addresses
This is a fix for commit f6795053da ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.

This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).

Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.

So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area().  To allow that, move those two macros out
of mm/mmap.c into include/linux/sched/mm.h

If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.

For the time being, only ARM64 is affected by this change.

Catalin (ARM64) said
 "We should have fixed hugetlb_get_unmapped_area() as well when we added
  support for 52-bit VA. The reason for commit f6795053da was to
  prevent normal mmap() from returning addresses above 48-bit by default
  as some user-space had hard assumptions about this.

  It's a slight ABI change if you do this for hugetlb_get_unmapped_area()
  but I doubt anyone would notice. It's more likely that the current
  behaviour would cause issues, so I'd rather have them consistent.

  Basically when arm64 gained support for 52-bit addresses we did not
  want user-space calling mmap() to suddenly get such high addresses,
  otherwise we could have inadvertently broken some programs (similar
  behaviour to x86 here). Hence we added commit f6795053da. But we
  missed hugetlbfs which could still get such high mmap() addresses. So
  in theory that's a potential regression that should have bee addressed
  at the same time as commit f6795053da (and before arm64 enabled
  52-bit addresses)"

Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu
Fixes: f6795053da ("mm: mmap: Allow for "high" userspace addresses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>	[5.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21 20:01:09 -07:00
Jaegeuk Kim 4d8ec91208 f2fs: should not truncate blocks during roll-forward recovery
If the file preallocated blocks and fsync'ed, we should not truncate them during
roll-forward recovery which will recover i_size correctly back.

Fixes: d4dd19ec1e ("f2fs: do not expose unwritten blocks to user by DIO")
Cc: <stable@vger.kernel.org> # 5.17+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-21 18:57:09 -07:00
Ye Bin 23e3d7f706 jbd2: fix a potential race while discarding reserved buffers after an abort
we got issue as follows:
[   72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
[   72.826847] EXT4-fs (sda): Remounting filesystem read-only
fallocate: fallocate failed: Read-only file system
[   74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
[   74.793597] ------------[ cut here ]------------
[   74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
[   74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150
[   74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
[   74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202
[   74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90
[   74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20
[   74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90
[   74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00
[   74.807301] FS:  0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000
[   74.808338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0
[   74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   74.811897] Call Trace:
[   74.812241]  <TASK>
[   74.812566]  __jbd2_journal_refile_buffer+0x12f/0x180
[   74.813246]  jbd2_journal_refile_buffer+0x4c/0xa0
[   74.813869]  jbd2_journal_commit_transaction.cold+0xa1/0x148
[   74.817550]  kjournald2+0xf8/0x3e0
[   74.819056]  kthread+0x153/0x1c0
[   74.819963]  ret_from_fork+0x22/0x30

Above issue may happen as follows:
        write                   truncate                   kjournald2
generic_perform_write
 ext4_write_begin
  ext4_walk_page_buffers
   do_journal_get_write_access ->add BJ_Reserved list
 ext4_journalled_write_end
  ext4_walk_page_buffers
   write_end_fn
    ext4_handle_dirty_metadata
                ***************JBD2 ABORT**************
     jbd2_journal_dirty_metadata
 -> return -EROFS, jh in reserved_list
                                                   jbd2_journal_commit_transaction
                                                    while (commit_transaction->t_reserved_list)
                                                      jh = commit_transaction->t_reserved_list;
                        truncate_pagecache_range
                         do_invalidatepage
			  ext4_journalled_invalidatepage
			   jbd2_journal_invalidatepage
			    journal_unmap_buffer
			     __dispose_buffer
			      __jbd2_journal_unfile_buffer
			       jbd2_journal_put_journal_head ->put last ref_count
			        __journal_remove_journal_head
				 bh->b_private = NULL;
				 jh->b_bh = NULL;
				                      jbd2_journal_refile_buffer(journal, jh);
							bh = jh2bh(jh);
							->bh is NULL, later will trigger null-ptr-deref
				 journal_free_journal_head(jh);

After commit 96f1e09745, we no longer hold the j_state_lock while
iterating over the list of reserved handles in
jbd2_journal_commit_transaction().  This potentially allows the
journal_head to be freed by journal_unmap_buffer while the commit
codepath is also trying to free the BJ_Reserved buffers.  Keeping
j_state_lock held while trying extends hold time of the lock
minimally, and solves this issue.

Fixes: 96f1e0974575("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-21 14:21:30 -04:00
Christian Brauner 0014edaedf
fs: unset MNT_WRITE_HOLD on failure
After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD
and consequently we always need to pair mnt_hold_writers() with
mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a
do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for
the first mount that was changed. Fix this and make sure that the first mount
will be cleaned up and add some comments to make it more obvious.

Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@google.com
Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@google.com
Link: https://lore.kernel.org/r/20220420131925.2464685-1-brauner@kernel.org
Fixes: e257039f0f ("mount_setattr(): clean the control flow and calling conventions") [1]
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+10a16d1c43580983f6a2@syzkaller.appspotmail.com
Reported-by: syzbot+306090cfa3294f0bbfb3@syzkaller.appspotmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-04-21 17:57:37 +02:00
Naohiro Aota 5f0addf7b8 btrfs: zoned: use dedicated lock for data relocation
Currently, we use btrfs_inode_{lock,unlock}() to grant an exclusive
writeback of the relocation data inode in
btrfs_zoned_data_reloc_{lock,unlock}(). However, that can cause a deadlock
in the following path.

Thread A takes btrfs_inode_lock() and waits for metadata reservation by
e.g, waiting for writeback:

prealloc_file_extent_cluster()
  - btrfs_inode_lock(&inode->vfs_inode, 0);
  - btrfs_prealloc_file_range()
  ...
    - btrfs_replace_file_extents()
      - btrfs_start_transaction
      ...
        - btrfs_reserve_metadata_bytes()

Thread B (e.g, doing a writeback work) needs to wait for the inode lock to
continue writeback process:

do_writepages
  - btrfs_writepages
    - extent_writpages
      - btrfs_zoned_data_reloc_lock(BTRFS_I(inode));
        - btrfs_inode_lock()

The deadlock is caused by relying on the vfs_inode's lock. By using it, we
introduced unnecessary exclusion of writeback and
btrfs_prealloc_file_range(). Also, the lock at this point is useless as we
don't have any dirty pages in the inode yet.

Introduce fs_info->zoned_data_reloc_io_lock and use it for the exclusive
writeback.

Fixes: 35156d8527 ("btrfs: zoned: only allow one process to add pages to a relocation inode")
CC: stable@vger.kernel.org # 5.16.x: 869f4cdc73f9: btrfs: zoned: encapsulate inode locking for zoned relocation
CC: stable@vger.kernel.org # 5.16.x
CC: stable@vger.kernel.org # 5.17
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-21 16:06:24 +02:00
Filipe Manana a692e13d87 btrfs: fix assertion failure during scrub due to block group reallocation
During a scrub, or device replace, we can race with block group removal
and allocation and trigger the following assertion failure:

[7526.385524] assertion failed: cache->start == chunk_offset, in fs/btrfs/scrub.c:3817
[7526.387351] ------------[ cut here ]------------
[7526.387373] kernel BUG at fs/btrfs/ctree.h:3599!
[7526.388001] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[7526.388970] CPU: 2 PID: 1158150 Comm: btrfs Not tainted 5.17.0-rc8-btrfs-next-114 #4
[7526.390279] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[7526.392430] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[7526.393520] Code: f3 48 c7 c7 20 (...)
[7526.396926] RSP: 0018:ffffb9154176bc40 EFLAGS: 00010246
[7526.397690] RAX: 0000000000000048 RBX: ffffa0db8a910000 RCX: 0000000000000000
[7526.398732] RDX: 0000000000000000 RSI: ffffffff9d7239a2 RDI: 00000000ffffffff
[7526.399766] RBP: ffffa0db8a911e10 R08: ffffffffa71a3ca0 R09: 0000000000000001
[7526.400793] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa0db4b170800
[7526.401839] R13: 00000003494b0000 R14: ffffa0db7c55b488 R15: ffffa0db8b19a000
[7526.402874] FS:  00007f6c99c40640(0000) GS:ffffa0de6d200000(0000) knlGS:0000000000000000
[7526.404038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7526.405040] CR2: 00007f31b0882160 CR3: 000000014b38c004 CR4: 0000000000370ee0
[7526.406112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[7526.407148] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[7526.408169] Call Trace:
[7526.408529]  <TASK>
[7526.408839]  scrub_enumerate_chunks.cold+0x11/0x79 [btrfs]
[7526.409690]  ? do_wait_intr_irq+0xb0/0xb0
[7526.410276]  btrfs_scrub_dev+0x226/0x620 [btrfs]
[7526.410995]  ? preempt_count_add+0x49/0xa0
[7526.411592]  btrfs_ioctl+0x1ab5/0x36d0 [btrfs]
[7526.412278]  ? __fget_files+0xc9/0x1b0
[7526.412825]  ? kvm_sched_clock_read+0x14/0x40
[7526.413459]  ? lock_release+0x155/0x4a0
[7526.414022]  ? __x64_sys_ioctl+0x83/0xb0
[7526.414601]  __x64_sys_ioctl+0x83/0xb0
[7526.415150]  do_syscall_64+0x3b/0xc0
[7526.415675]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[7526.416408] RIP: 0033:0x7f6c99d34397
[7526.416931] Code: 3c 1c e8 1c ff (...)
[7526.419641] RSP: 002b:00007f6c99c3fca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[7526.420735] RAX: ffffffffffffffda RBX: 00005624e1e007b0 RCX: 00007f6c99d34397
[7526.421779] RDX: 00005624e1e007b0 RSI: 00000000c400941b RDI: 0000000000000003
[7526.422820] RBP: 0000000000000000 R08: 00007f6c99c40640 R09: 0000000000000000
[7526.423906] R10: 00007f6c99c40640 R11: 0000000000000246 R12: 00007fff746755de
[7526.424924] R13: 00007fff746755df R14: 0000000000000000 R15: 00007f6c99c40640
[7526.425950]  </TASK>

That assertion is relatively new, introduced with commit d04fbe19ae
("btrfs: scrub: cleanup the argument list of scrub_chunk()").

The block group we get at scrub_enumerate_chunks() can actually have a
start address that is smaller then the chunk offset we extracted from a
device extent item we got from the commit root of the device tree.
This is very rare, but it can happen due to a race with block group
removal and allocation. For example, the following steps show how this
can happen:

1) We are at transaction T, and we have the following blocks groups,
   sorted by their logical start address:

   [ bg A, start address A, length 1G (data) ]
   [ bg B, start address B, length 1G (data) ]
   (...)
   [ bg W, start address W, length 1G (data) ]

     --> logical address space hole of 256M,
         there used to be a 256M metadata block group here

   [ bg Y, start address Y, length 256M (metadata) ]

      --> Y matches W's end offset + 256M

   Block group Y is the block group with the highest logical address in
   the whole filesystem;

2) Block group Y is deleted and its extent mapping is removed by the call
   to remove_extent_mapping() made from btrfs_remove_block_group().

   So after this point, the last element of the mapping red black tree,
   its rightmost node, is the mapping for block group W;

3) While still at transaction T, a new data block group is allocated,
   with a length of 1G. When creating the block group we do a call to
   find_next_chunk(), which returns the logical start address for the
   new block group. This calls returns X, which corresponds to the
   end offset of the last block group, the rightmost node in the mapping
   red black tree (fs_info->mapping_tree), plus one.

   So we get a new block group that starts at logical address X and with
   a length of 1G. It spans over the whole logical range of the old block
   group Y, that was previously removed in the same transaction.

   However the device extent allocated to block group X is not the same
   device extent that was used by block group Y, and it also does not
   overlap that extent, which must be always the case because we allocate
   extents by searching through the commit root of the device tree
   (otherwise it could corrupt a filesystem after a power failure or
   an unclean shutdown in general), so the extent allocator is behaving
   as expected;

4) We have a task running scrub, currently at scrub_enumerate_chunks().
   There it searches for device extent items in the device tree, using
   its commit root. It finds a device extent item that was used by
   block group Y, and it extracts the value Y from that item into the
   local variable 'chunk_offset', using btrfs_dev_extent_chunk_offset();

   It then calls btrfs_lookup_block_group() to find block group for
   the logical address Y - since there's currently no block group that
   starts at that logical address, it returns block group X, because
   its range contains Y.

   This results in triggering the assertion:

      ASSERT(cache->start == chunk_offset);

   right before calling scrub_chunk(), as cache->start is X and
   chunk_offset is Y.

This is more likely to happen of filesystems not larger than 50G, because
for these filesystems we use a 256M size for metadata block groups and
a 1G size for data block groups, while for filesystems larger than 50G,
we use a 1G size for both data and metadata block groups (except for
zoned filesystems). It could also happen on any filesystem size due to
the fact that system block groups are always smaller (32M) than both
data and metadata block groups, but these are not frequently deleted, so
much less likely to trigger the race.

So make scrub skip any block group with a start offset that is less than
the value we expect, as that means it's a new block group that was created
in the current transaction. It's pointless to continue and try to scrub
its extents, because scrub searches for extents using the commit root, so
it won't find any. For a device replace, skip it as well for the same
reasons, and we don't need to worry about the possibility of extents of
the new block group not being to the new device, because we have the write
duplication setup done through btrfs_map_block().

Fixes: d04fbe19ae ("btrfs: scrub: cleanup the argument list of scrub_chunk()")
CC: stable@vger.kernel.org # 5.17
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-21 16:06:19 +02:00
Ronnie Sahlberg f5d0f921ea cifs: destage any unwritten data to the server before calling copychunk_write
because the copychunk_write might cover a region of the file that has not yet
been sent to the server and thus fail.

A simple way to reproduce this is:
truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile

the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with
the 'fcollapse 16k 24k' due to write-back caching.

fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call
and it will fail serverside since the file is still 0b in size serverside
until the writes have been destaged.
To avoid this we must ensure that we destage any unwritten data to the
server before calling COPYCHUNK_WRITE.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-20 22:54:54 -05:00
Paulo Alcantara cd70a3e898 cifs: use correct lock type in cifs_reconnect()
TCP_Server_Info::origin_fullpath and TCP_Server_Info::leaf_fullpath
are protected by refpath_lock mutex and not cifs_tcp_ses_lock
spinlock.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-20 22:54:39 -05:00
Paulo Alcantara 41f10081a9 cifs: fix NULL ptr dereference in refresh_mounts()
Either mount(2) or automount might not have server->origin_fullpath
set yet while refresh_cache_worker() is attempting to refresh DFS
referrals.  Add missing NULL check and locking around it.

This fixes bellow crash:

[ 1070.276835] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 1070.277676] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 1070.278219] CPU: 1 PID: 8506 Comm: kworker/u8:1 Not tainted 5.18.0-rc3 #10
[ 1070.278701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[ 1070.279495] Workqueue: cifs-dfscache refresh_cache_worker [cifs]
[ 1070.280044] RIP: 0010:strcasecmp+0x34/0x150
[ 1070.280359] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44
[ 1070.281729] RSP: 0018:ffffc90008367958 EFLAGS: 00010246
[ 1070.282114] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
[ 1070.282691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1070.283273] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27
[ 1070.283857] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000
[ 1070.284436] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000
[ 1070.284990] FS:  0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000
[ 1070.285625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1070.286100] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0
[ 1070.286683] Call Trace:
[ 1070.286890]  <TASK>
[ 1070.287070]  refresh_cache_worker+0x895/0xd20 [cifs]
[ 1070.287475]  ? __refresh_tcon.isra.0+0xfb0/0xfb0 [cifs]
[ 1070.287905]  ? __lock_acquire+0xcd1/0x6960
[ 1070.288247]  ? is_dynamic_key+0x1a0/0x1a0
[ 1070.288591]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 1070.289012]  ? lock_downgrade+0x6f0/0x6f0
[ 1070.289318]  process_one_work+0x7bd/0x12d0
[ 1070.289637]  ? worker_thread+0x160/0xec0
[ 1070.289970]  ? pwq_dec_nr_in_flight+0x230/0x230
[ 1070.290318]  ? _raw_spin_lock_irq+0x5e/0x90
[ 1070.290619]  worker_thread+0x5ac/0xec0
[ 1070.290891]  ? process_one_work+0x12d0/0x12d0
[ 1070.291199]  kthread+0x2a5/0x350
[ 1070.291430]  ? kthread_complete_and_exit+0x20/0x20
[ 1070.291770]  ret_from_fork+0x22/0x30
[ 1070.292050]  </TASK>
[ 1070.292223] Modules linked in: bpfilter cifs cifs_arc4 cifs_md4
[ 1070.292765] ---[ end trace 0000000000000000 ]---
[ 1070.293108] RIP: 0010:strcasecmp+0x34/0x150
[ 1070.293471] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44
[ 1070.297718] RSP: 0018:ffffc90008367958 EFLAGS: 00010246
[ 1070.298622] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
[ 1070.299428] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1070.300296] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27
[ 1070.301204] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000
[ 1070.301932] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000
[ 1070.302645] FS:  0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000
[ 1070.303462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1070.304131] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0
[ 1070.305004] Kernel panic - not syncing: Fatal exception
[ 1070.305711] Kernel Offset: disabled
[ 1070.305971] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-20 22:54:17 -05:00
Damien Le Moal 1da18a296f zonefs: Fix management of open zones
The mount option "explicit_open" manages the device open zone
resources to ensure that if an application opens a sequential file for
writing, the file zone can always be written by explicitly opening
the zone and accounting for that state with the s_open_zones counter.

However, if some zones are already open when mounting, the device open
zone resource usage status will be larger than the initial s_open_zones
value of 0. Ensure that this inconsistency does not happen by closing
any sequential zone that is open when mounting.

Furthermore, with ZNS drives, closing an explicitly open zone that has
not been written will change the zone state to "closed", that is, the
zone will remain in an active state. Since this can then cause failures
of explicit open operations on other zones if the drive active zone
resources are exceeded, we need to make sure that the zone is not
active anymore by resetting it instead of closing it. To address this,
zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
into a REQ_OP_ZONE_RESET for sequential zones that have not been
written.

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
2022-04-21 08:39:20 +09:00
Damien Le Moal 694852ead2 zonefs: Clear inode information flags on inode creation
Ensure that the i_flags field of struct zonefs_inode_info is cleared to
0 when initializing a zone file inode, avoiding seeing the flag
ZONEFS_ZONE_OPEN being incorrectly set.

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
2022-04-21 08:39:10 +09:00
Dave Chinner 9a5280b312 xfs: reorder iunlink remove operation in xfs_ifree
The O_TMPFILE creation implementation creates a specific order of
operations for inode allocation/freeing and unlinked list
modification. Currently both are serialised by the AGI, so the order
doesn't strictly matter as long as the are both in the same
transaction.

However, if we want to move the unlinked list insertions largely out
from under the AGI lock, then we have to be concerned about the
order in which we do unlinked list modification operations.
O_TMPFILE creation tells us this order is inode allocation/free,
then unlinked list modification.

Change xfs_ifree() to use this same ordering on unlinked list
removal. This way we always guarantee that when we enter the
iunlinked list removal code from this path, we already have the AGI
locked and we don't have to worry about lock nesting AGI reads
inside unlink list locks because it's already locked and attached to
the transaction.

We can do this safely as the inode freeing and unlinked list removal
are done in the same transaction and hence are atomic operations
with respect to log recovery.


Reported-by: Frank Hofmann <fhofmann@cloudflare.com>
Fixes: 298f7bec50 ("xfs: pin inode backing buffer to the inode log item")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 08:45:16 +10:00
Dave Chinner b9b3fe152e xfs: convert buffer flags to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
fields to be unsigned. This manifests as a compiler error such as:

/kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk'
  TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d "
  ^
/kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags'
     __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
     ^
/kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED'
  { XBF_UNMAPPED,  "UNMAPPED" }
    ^
/kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS'
     __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
                                        ^
/kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class':
/kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant
 #define XBF_UNMAPPED  (1 << 31)/* do not map the buffer */

as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an
unsigned long for the flag. Since this results in the value of
XBF_UNMAPPED causing a signed integer overflow, the result is
technically undefined behavior, which gcc-5 does not accept as an
integer constant.

This is based on a patch from Arnd Bergman <arnd@arndb.de>.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 08:44:59 +10:00
Linus Torvalds 10c5f102e2 Changes since last update:
- Fix use-after-free of the on-stack z_erofs_decompressqueue;
 
  - Fix sysfs documentation Sphinx warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYl2A0REceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBEe7AQCh7aNhYEncBnoHvrB276HCP0xmMwmc0gPq
 6UeSWgarpAEArgfgLyo8tFgS+gvXbhB8/P1FqEMwaRZVLWathXID3QU=
 =GFAS
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "One patch to fix a use-after-free race related to the on-stack
  z_erofs_decompressqueue, which happens very rarely but needs to be
  fixed properly soon.

  The other patch fixes some sysfs Sphinx warnings"

* tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors
  erofs: fix use-after-free of on-stack io[]
2022-04-20 12:35:20 -07:00
Linus Torvalds 906f904097 Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"
This reverts commit 5a519c8fe4.

It turns out that making the pipe almost arbitrarily large has some
rather unexpected downsides.  The kernel test robot reports a kernel
warning that is due to pipe->max_usage now growing to the point where
the iter_file_splice_write() buffer allocation can no longer be
satisfied as a slab allocation, and the

        int nbufs = pipe->max_usage;
        struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
                                        GFP_KERNEL);

code sequence there will now always fail as a result.

That code could be modified to use kvcalloc() too, but I feel very
uncomfortable making those kinds of changes for a very niche use case
that really should have other options than make these kinds of
fundamental changes to pipe behavior.

Maybe the CRIU process dumping should be multi-threaded, and use
multiple pipes and multiple cores, rather than try to use one larger
pipe to minimize splice() calls.

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-20 12:07:53 -07:00
Jaegeuk Kim 27275f181c f2fs: fix wrong condition check when failing metapage read
This patch fixes wrong initialization.

Fixes: 50c63009f6 ("f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20 11:16:43 -07:00
Jaegeuk Kim 0adc2ab0e8 f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders
Let's attach io_flags to bio only, so that we can merge IOs given original
io_flags only.

Fixes: 64bf0eef01 ("f2fs: pass the bio operation to bio_alloc_bioset")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20 11:16:43 -07:00
Jaegeuk Kim 930e260763 f2fs: remove obsolete whint_mode
This patch removes obsolete whint_mode.

Fixes: 41d36a9f3e ("fs: remove kiocb.ki_hint")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20 11:16:43 -07:00
Christian Brauner 705191b03d fs: fix acl translation
Last cycle we extended the idmapped mounts infrastructure to support
idmapped mounts of idmapped filesystems (No such filesystem yet exist.).
Since then, the meaning of an idmapped mount is a mount whose idmapping
is different from the filesystems idmapping.

While doing that work we missed to adapt the acl translation helpers.
They still assume that checking for the identity mapping is enough.  But
they need to use the no_idmapping() helper instead.

Note, POSIX ACLs are always translated right at the userspace-kernel
boundary using the caller's current idmapping and the initial idmapping.
The order depends on whether we're coming from or going to userspace.
The filesystem's idmapping doesn't matter at the border.

Consequently, if a non-idmapped mount is passed we need to make sure to
always pass the initial idmapping as the mount's idmapping and not the
filesystem idmapping.  Since it's irrelevant here it would yield invalid
ids and prevent setting acls for filesystems that are mountable in a
userns and support posix acls (tmpfs and fuse).

I verified the regression reported in [1] and verified that this patch
fixes it.  A regression test will be added to xfstests in parallel.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1]
Fixes: bd303368b7 ("fs: support mapped mounts of mapped filesystems")
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # 5.17
Cc: <regressions@lists.linux.dev>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-19 10:19:02 -07:00
Christoph Hellwig 0fdf977d45 btrfs: fix direct I/O writes for split bios on zoned devices
When a bio is split in btrfs_submit_direct, dip->file_offset contains
the file offset for the first bio.  But this means the start value used
in btrfs_end_dio_bio to record the write location for zone devices is
incorrect for subsequent bios.

CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:45:04 +02:00
Christoph Hellwig 00d825258b btrfs: fix direct I/O read repair for split bios
When a bio is split in btrfs_submit_direct, dip->file_offset contains
the file offset for the first bio.  But this means the start value used
in btrfs_check_read_dio_bio is incorrect for subsequent bios.  Add
a file_offset field to struct btrfs_bio to pass along the correct offset.

Given that check_data_csum only uses start of an error message this
means problems with this miscalculation will only show up when I/O fails
or checksums mismatch.

The logic was removed in f4f39fc5dc ("btrfs: remove btrfs_bio::logical
member") but we need it due to the bio splitting.

CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:44:56 +02:00
Christoph Hellwig 50f1cff3d8 btrfs: fix and document the zoned device choice in alloc_new_bio
Zone Append bios only need a valid block device in struct bio, but
not the device in the btrfs_bio.  Use the information from
btrfs_zoned_get_device to set up bi_bdev and fix zoned writes on
multi-device file system with non-homogeneous capabilities and remove
the pointless btrfs_bio.device assignment.

Add big fat comments explaining what is going on here.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:44:49 +02:00
Filipe Manana 50ff57888d btrfs: fix leaked plug after failure syncing log on zoned filesystems
On a zoned filesystem, if we fail to allocate the root node for the log
root tree while syncing the log, we end up returning without finishing
the IO plug we started before, resulting in leaking resources as we
have started writeback for extent buffers of a log tree before. That
allocation failure, which typically is either -ENOMEM or -ENOSPC, is not
fatal and the fsync can safely fallback to a full transaction commit.

So release the IO plug if we fail to allocate the extent buffer for the
root of the log root tree when syncing the log on a zoned filesystem.

Fixes: 3ddebf27fc ("btrfs: zoned: reorder log node allocation on zoned filesystem")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:44:17 +02:00
Haowen Bai 9339faac6d cifs: Use kzalloc instead of kmalloc/memset
Use kzalloc rather than duplicating its implementation, which
makes code simple and easy to understand.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-18 10:22:57 -05:00
Pavel Begunkov c0713540f6 io_uring: fix leaks on IOPOLL and CQE_SKIP
If all completed requests in io_do_iopoll() were marked with
REQ_F_CQE_SKIP, we'll not only skip CQE posting but also
io_free_batch_list() leaking memory and resources.

Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially
return the value greater than the real one, but iopolling will deal with
it and the userspace will re-iopoll if needed. In anyway, I don't think
there are many use cases for REQ_F_CQE_SKIP + IOPOLL.

Fixes: 83a13a4181 ("io_uring: tweak iopoll CQE_SKIP event counting")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5072fc8693fbfd595f89e5d4305bfcfd5d2f0a64.1650186611.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 06:54:11 -06:00
Jens Axboe 323b190ba2 io_uring: free iovec if file assignment fails
We just return failure in this case, but we need to release the iovec
first. If we're doing IO with more than FAST_IOV segments, then the
iovec is allocated and must be freed.

Reported-by: syzbot+96b43810dfe9c3bb95ed@syzkaller.appspotmail.com
Fixes: 584b0180f0 ("io_uring: move read/write file prep state into actual opcode handler")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-16 21:14:00 -06:00