Commit graph

7000 commits

Author SHA1 Message Date
Ming Lei
5291984004 dm: fix bio polling to handle possibile BLK_STS_AGAIN
Expanded testing of DM's bio polling support (using more fio threads
to dm-linear ontop of null_blk) exposed the possibility for polled
bios to hang (repeatedly polling in io_uring) when null_blk responds
with BLK_STS_AGAIN (due to lack of resources):

1) io_complete_rw_iopoll() is called from blkdev_bio_end_io_async() to
   notify kiocb is done, that is the completion interface between block
   layer and io_uring

2) io_complete_rw_iopoll() is called from io_do_iopoll()

3) dm returns BLK_STS_AGAIN for one bio (on behalf of underlying
   driver), then io_complete_rw_iopoll is called, but io_do_iopoll()
   doesn't handle -EAGAIN at all (due to logic in io_rw_should_reissue)

4) reason for dm's BLK_STS_AGAIN is underlying null_blk driver ran out
   of requests (easier to reproduce by setting low hw_queue_depth).

5) dm should handle BLK_STS_AGAIN for POLLED underlying IO, and may
   retry in dm layer.

This fix adds REQ_POLLED specific BLK_STS_AGAIN handling to
dm_io_complete() that clears REQ_POLLED and requeues the bio to DM
using queue_io().

Fixes: b99fdcdc36 ("dm: support bio polling")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[snitzer: revised header, reused dm_io_complete's REQ_POLLED case]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 13:23:12 -04:00
Mikulas Patocka
aad5b23ebf dm: fix dm_io and dm_target_io flags race condition on Alpha
Early alpha processors cannot write a single byte or short; they read 8
bytes, modify the value in registers and write back 8 bytes.

This could cause race condition in the structure dm_io - if the fields
flags and io_count are modified simultaneously.

Fix this bug by using 32-bit flags if we are on Alpha and if we are
compiling for a processor that doesn't have the byte-word-extension.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: bd4a6dd241 ("dm: reduce size of dm_io and dm_target_io structs")
[snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!]
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 13:19:27 -04:00
Mikulas Patocka
cc09e8a9de dm integrity: set journal entry unused when shrinking device
Commit f6f72f32c2 ("dm integrity: don't replay journal data past the
end of the device") skips journal replay if the target sector points
beyond the end of the device. Unfortunatelly, it doesn't set the
journal entry unused, which resulted in this BUG being triggered:
BUG_ON(!journal_entry_is_unused(je))

Fix this by calling journal_entry_set_unused() for this case.

Fixes: f6f72f32c2 ("dm integrity: don't replay journal data past the end of the device")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
[snitzer: revised header]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 10:31:23 -04:00
Mikulas Patocka
dbdcc906d9 dm ioctl: log an error if the ioctl structure is corrupted
This will help triage bugs when userspace is passing invalid ioctl
structure to the kernel.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[snitzer: log errors using DMERR instead of DMWARN]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 10:29:43 -04:00
Linus Torvalds
561593a048 for-5.18/write-streams-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI1AHwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplPjEACVJzKg5NkxpdkDThvq5tejws9KxB/4mHJg
 NoDMcv1TF+Orsd/HNW6XrgYnbU0ObHom3568/xb8BNegRVFe7V4ME/4IYNRyGOmV
 qbfciu04L1UkJhI52CIidkOioFABL3r1zgLCIz5vk0Cv9X7Le9x0UabHxJf7u9S+
 Z3lNdyxezN0SGx8VT86l/7lSoHtG3VHO9IsQCuNGF02SB+6uGpXBlptbEoQ4nTxd
 T7/H9FNOe2Wf7eKvcOOds8UlvZYAfYcY0GcRrIOXdHIy25mKFWwn5cDgFTMOH5ID
 xXpm+JFkDkrfSW1o4FFPxbN9Z6RbVXbGCsrXlIragLO2MJQdXiIUxS1OPT5oAado
 H9MlX6QtkwziLW9zUWa/N/jmRjc2vzHAxD6JFg/wXxNdtY0kd8TQpaxwTB8mVDPN
 VCGutt7lJS1CQInQ+ppzbdqzzuLHC1RHAyWSmfUE9rb8cbjxtJBnSIorYRLUesMT
 GRwqVTXW0osxSgCb1iDiBCJANrX1yPZcemv4Wh1gzbT6IE9sWxWXsE5sy9KvswNc
 M+E4nu/TYYTfkynItJjLgmDLOoi+V0FBY6ba0mRPBjkriSP4AVlwsZLGVsAHQzuA
 o5paW1GjRCCwhIQ6+AzZIoOz6wqvprBlUgUkUneyYAQ2ZKC3pZi8zPnpoVdFucVa
 VaTzP71C1Q==
 =efaq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block

Pull NVMe write streams removal from Jens Axboe:
 "This removes the write streams support in NVMe. No vendor ever really
  shipped working support for this, and they are not interested in
  supporting it.

  With the NVMe support gone, we have nothing in the tree that supports
  this. Remove passing around of the hints.

  The only discussion point in this patchset imho is the fact that the
  file specific write hint setting/getting fcntl helpers will now return
  -1/EINVAL like they did before we supported write hints. No known
  applications use these functions, I only know of one prototype that I
  help do for RocksDB, and that's not used. That said, with a change
  like this, it's always a bit controversial. Alternatively, we could
  just make them return 0 and pretend it worked. It's placement based
  hints after all"

* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
  fs: remove fs.f_write_hint
  fs: remove kiocb.ki_hint
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-26 11:51:46 -07:00
Linus Torvalds
6f2689a766 SCSI misc on 20220324
This series consists of the usual driver updates (qla2xxx, pm8001,
 libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
 and bug fixes.  The high blast radius core update is the removal of
 write same, which affects block and several non-SCSI devices.  The
 other big change, which is more local, is the removal of the SCSI
 pointer.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
 6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
 Ec1HDuUCE18H1H2QAh0=
 =/Ty9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (qla2xxx, pm8001,
  libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
  and bug fixes.

  The high blast radius core update is the removal of write same, which
  affects block and several non-SCSI devices. The other big change,
  which is more local, is the removal of the SCSI pointer"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
  scsi: scsi_ioctl: Drop needless assignment in sg_io()
  scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
  scsi: lpfc: Copyright updates for 14.2.0.0 patches
  scsi: lpfc: Update lpfc version to 14.2.0.0
  scsi: lpfc: SLI path split: Refactor BSG paths
  scsi: lpfc: SLI path split: Refactor Abort paths
  scsi: lpfc: SLI path split: Refactor SCSI paths
  scsi: lpfc: SLI path split: Refactor CT paths
  scsi: lpfc: SLI path split: Refactor misc ELS paths
  scsi: lpfc: SLI path split: Refactor VMID paths
  scsi: lpfc: SLI path split: Refactor FDISC paths
  scsi: lpfc: SLI path split: Refactor LS_RJT paths
  scsi: lpfc: SLI path split: Refactor LS_ACC paths
  scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
  scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
  scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
  scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
  scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
  scsi: lpfc: SLI path split: Refactor lpfc_iocbq
  scsi: lpfc: Use kcalloc()
  ...
2022-03-24 19:37:53 -07:00
Linus Torvalds
b1f8ccdaae - Significant refactoring and fixing of how DM core does bio-based IO
accounting with focus on fixing wildly inaccurate IO stats for
   dm-crypt (and other DM targets that defer bio submission in their
   own workqueues). End result is proper IO accounting, made possible
   by targets being updated to use the new dm_submit_bio_remap()
   interface.
 
 - Add hipri bio polling support (REQ_POLLED) to bio-based DM.
 
 - Reduce dm_io and dm_target_io structs so that a single dm_io (which
   contains dm_target_io and first clone bio) weighs in at 256 bytes.
   For reference the bio struct is 128 bytes.
 
 - Various other small cleanups, fixes or improvements in DM core and
   targets.
 
 - Update MAINTAINERS with my kernel.org email address to allow
   distinction between my "upstream" and "Red" Hats.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmI7ghIACgkQxSPxCi2d
 A1q04wgAzaRu186WjfCO8MK0uyv3S52Rw1EsgYealAqoPwQJ9KkW2icvjtwRL+fJ
 1+w6qE/Da6QdwXj9lGtp1XIXJFipNJSw3PSaE/tV2cXiBemZlzJ5vR6F6dfeYKmV
 /sGas46H2l+aD4Xr7unUmcN/AYrNIFtnucClY3+DlJFPesXQQc9a/XmL9RX9MrN4
 MS9wLkh/5QSG3zReEct/4GVmNSJAjFfLkkeFHtLN82jvvDmnszRT5+aJ06WkXeOz
 OZmQfOPnJv5MnFUz9DOaRb/fTCoyxzxLnNM5Lt3jyFPk9Jf8Qz9TJ2rgskxsE83u
 UsCD/Y/QAdDcrRVB5SS6+yx4AS6uSA==
 =cinj
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Significant refactoring and fixing of how DM core does bio-based IO
   accounting with focus on fixing wildly inaccurate IO stats for
   dm-crypt (and other DM targets that defer bio submission in their own
   workqueues). End result is proper IO accounting, made possible by
   targets being updated to use the new dm_submit_bio_remap() interface.

 - Add hipri bio polling support (REQ_POLLED) to bio-based DM.

 - Reduce dm_io and dm_target_io structs so that a single dm_io (which
   contains dm_target_io and first clone bio) weighs in at 256 bytes.
   For reference the bio struct is 128 bytes.

 - Various other small cleanups, fixes or improvements in DM core and
   targets.

 - Update MAINTAINERS with my kernel.org email address to allow
   distinction between my "upstream" and "Red" Hats.

* tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (46 commits)
  dm: consolidate spinlocks in dm_io struct
  dm: reduce size of dm_io and dm_target_io structs
  dm: switch dm_target_io booleans over to proper flags
  dm: switch dm_io booleans over to proper flags
  dm: update email address in MAINTAINERS
  dm: return void from __send_empty_flush
  dm: factor out dm_io_complete
  dm cache: use dm_submit_bio_remap
  dm: simplify dm_sumbit_bio_remap interface
  dm thin: use dm_submit_bio_remap
  dm: add WARN_ON_ONCE to dm_submit_bio_remap
  dm: support bio polling
  block: add ->poll_bio to block_device_operations
  dm mpath: use DMINFO instead of printk with KERN_INFO
  dm: stop using bdevname
  dm-zoned: remove the ->name field in struct dmz_dev
  dm: remove unnecessary local variables in __bind
  dm: requeue IO if mapping table not yet available
  dm io: remove stale comment block for dm_io()
  dm thin metadata: remove unused dm_thin_remove_block and __remove
  ...
2022-03-24 19:25:24 -07:00
Linus Torvalds
69d1dea852 for-5.18/drivers-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0/QUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn8GEACRVxJaJV5qjZfoFAQKoAWJEtquwjeARyB+
 0V8ROWHDWHSacdug9wBytayiS1lz2zmUHJ6YXyts2dn0v6CrK4s8yGzk5G/RgH6+
 6M3GmBKjj+r1DfE8L3OoQWkDR1JFPuFxXTG/uBd7fBY2Excih1Z0D2lpspMleIRf
 w8zBrlWrWH8lZlm6HF3fadjEoiWhOM5F4Ofz3eg/PAQrHuD06z8hjQgMeR0jQVzw
 bWF9jrdNIplxRjNWIwCTsQRM+z5KQhUGwDODJjIwdQtVaKSt9D99ZbeKTudlslQ2
 zrizsCq8P1RjBPcrA45FV6QnT9DIRRGrYzHD63qC6fDae34rbzdSHUwRMP2XSxo8
 +hT1AzGypiBauODTPzHFtTskaQ0KibLznEanChh/ThySmNYcEVAljSx3Z5Vo81J+
 IqJYK2m3RESCFruy9w3U/P7qiXZmqYldPfjxAKq8ucg6x1PU3XRAVm7SI/i4l75D
 Crk1ujj2LJgsyxL6qMrK3XUavl1SJdzWeFSarcCt3m4m11EWWfYzmG8Yn8OE2CEZ
 a2CAyDsRi8CZ3hvkaMwigL4wBJjrrig8vyIgok3VrfCmYlNNqMQqM5Rw7vzjR3v1
 cKewI3rQjkFXEaveIXyGPTI/0Da4cT0DOfn/Mws9MDUXNPlFMNEDUZkPuzMywiTB
 2SWDLTe77g==
 =993h
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:

 - NVMe updates via Christoph:
      - add vectored-io support for user-passthrough (Kanchan Joshi)
      - add verbose error logging (Alan Adamson)
      - support buffered I/O on block devices in nvmet (Chaitanya
        Kulkarni)
      - central discovery controller support (Martin Belanger)
      - fix and extended the globally unique idenfier validation
        (Christoph)
      - move away from the deprecated IDA APIs (Sagi Grimberg)
      - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
        Chaitanya Kulkarni)
      - add lockdep annotations for in-kernel sockets (Chris Leech)
      - use vmalloc for ANA log buffer (Hannes Reinecke)
      - kerneldoc fixes (Chaitanya Kulkarni)
      - cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
      - warn about shared namespaces without multipathing (Christoph)

 - MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
   Erik, Dirk)

 - loop cleanups and queue depth configuration (Chaitanya)

 - null_blk cleanups and fixes (Chaitanya)

 - Use descriptive init/exit names in virtio_blk (Randy)

 - Use bvec_kmap_local() in drivers (Christoph)

 - bcache fixes (Mingzhe)

 - xen blk-front persistent grant speedups (Juergen)

 - rnbd fix and cleanup (Gioh)

 - Misc fixes (Christophe, Colin)

* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
  virtio_blk: eliminate anonymous module_init & module_exit
  nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
  nvme: remove nvme_alloc_request and nvme_alloc_request_qid
  nvme: cleanup how disk->disk_name is assigned
  nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
  nvmet: use snprintf() with PAGE_SIZE in configfs
  nvmet: don't fold lines
  nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
  nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
  nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
  nvme-tcp: lockdep: annotate in-kernel sockets
  nvme-tcp: don't fold the line
  nvme-tcp: don't initialize ret variable
  nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
  nvme-multipath: use vmalloc for ANA log buffer
  xen/blkfront: speed up purge_persistent_grants()
  raid5: initialize the stripe_head embeeded bios as needed
  raid5-cache: statically allocate the recovery ra bio
  raid5-cache: fully initialize flush_bio when needed
  raid5-ppl: fully initialize the bio in ppl_new_iounit
  ...
2022-03-21 17:16:01 -07:00
Linus Torvalds
616355cc81 for-5.18/block-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0+GcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUpD/9aTJEnj7VCw7UouSsg098sdjtoy9ilslU3
 ew47K8CIXHbCB4CDqLnFyvCwAdG1XGgS+fUmFAxvTr29R9SZeS5d+bXL6sZzEo0C
 bwxsJy9MM2QRtMvB+giAt1myXbwB8cG+ketMBWXqwXXRHRzPbbQfMZia7FqWMnfY
 KQanH9IwYHp1oa5U/W6Qcjm4oCnLgBMRwqByzUCtiF3y9qgaLkK+3IgkNwjJQjLA
 DTeUJ/9CgxGQQbzA+LPktbw2xfTqiUfcKq0mWx6Zt4wwNXn1ClqUDUXX6QSM8/5u
 3OimbscSkEPPTIYZbVBPkhFnAlQb4JaJEgOrbXvYKVV2Dh+eZY81XwNeE/E8gdBY
 TnHOTOCjkN/4sR3hIrWazlJzPLdpPA0eOYrhguCraQsX9mcsYNxlJ9otRv/Ve99g
 uqL0RZg3+NoK84fm79FCGy/ZmPQJvJttlBT9CKVwylv/Lky42xWe7AdM3OipKluY
 2nh+zN5Ai7WxZdTKXQFRhCSWfWQ+1qW51tB3dcGW+BooZr/oox47qKQVcHsEWbq1
 RNR45F5a4AuPwYUHF/P36WviLnEuq9AvX7OTTyYOplyVQohKIoDXp9chVzLNzBiZ
 KBR00W6MLKKKN+8foalQWgNyb2i2PH7Ib4xRXvXj/22Vwxg5UmUoBmSDSas9SZUS
 +dMo7CtNgA==
 =DpgP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Mike Snitzer
4d7bca13dd dm: consolidate spinlocks in dm_io struct
No reason to have separate startio_lock and endio_lock given endio_lock
could be used during submission anyway.

This change leaves the dm_io struct weighing in at 256 bytes (down
from 272 bytes, so saves a cacheline).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:36 -04:00
Mike Snitzer
bd4a6dd241 dm: reduce size of dm_io and dm_target_io structs
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:35 -04:00
Mike Snitzer
655f3aad7a dm: switch dm_target_io booleans over to proper flags
Add flags to dm_target_io and manage them using the same pattern used
for bi_flags in struct bio.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:35 -04:00
Mike Snitzer
82f6cdcc36 dm: switch dm_io booleans over to proper flags
Add flags to dm_io and manage them using the same pattern used for
bi_flags in struct bio.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:34 -04:00
Mike Snitzer
332f2b1e73 dm: return void from __send_empty_flush
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 16:52:11 -05:00
Mike Snitzer
e27363472f dm: factor out dm_io_complete
Optimizes dm_io_dec_pending() slightly by avoiding local variables.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 16:52:10 -05:00
Mike Snitzer
69596f555b dm cache: use dm_submit_bio_remap
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:57 -05:00
Mike Snitzer
b7f8dff098 dm: simplify dm_sumbit_bio_remap interface
Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the
need for dm_sumbit_bio_remap() callers to know whether they are
calling for a workqueue or from the original dm_submit_bio().

Add map_task to dm_io struct, record the map_task in alloc_io and
clear it after all target ->map() calls have completed. Update
dm_sumbit_bio_remap to check if 'current' matches io->map_task rather
than rely on passed 'from_rq' argument.

This change really simplifies the chore of porting each DM target to
using dm_sumbit_bio_remap() because there is no longer the risk of
programming error by not completely knowing all the different contexts
a particular method that calls dm_sumbit_bio_remap() might be used in.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:56 -05:00
Mike Snitzer
a92512819b dm thin: use dm_submit_bio_remap
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:55 -05:00
Mike Snitzer
0a8e9599b9 dm: add WARN_ON_ONCE to dm_submit_bio_remap
If a target uses dm_submit_bio_remap() it should set
ti->accounts_remapped_io.

Also, switch dm_start_io_acct() WARN_ON to WARN_ON_ONCE.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:43 -05:00
Ming Lei
b99fdcdc36 dm: support bio polling
Support bio polling (REQ_POLLED) in the following approach:

1) only support io polling on normal READ/WRITE, and other abnormal IOs
still fallback to IRQ mode, so the target io (and DM's clone bio) is
exactly inside the dm io.

2) hold one refcnt on io->io_count after submitting this dm bio with
REQ_POLLED

3) support dm native bio splitting, any dm io instance associated with
current bio will be added into one list which head is bio->bi_private
which will be recovered before ending this bio

4) implement .poll_bio() callback, call bio_poll() on the single target
bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
dm_io_dec_pending() after the target io is done in .poll_bio()

5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
which is based on Jeffle's previous patch.

These changes are good for a 30-35% IOPS improvement for polled IO.

For detailed test results please see (Jens, thanks for testing!):
https://listman.redhat.com/archives/dm-devel/2022-March/049868.html
or https://marc.info/?l=linux-block&m=164684246214700&w=2

Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-09 12:21:56 -05:00
Christoph Hellwig
03a6b195e8 raid5: initialize the stripe_head embeeded bios as needed
Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:13 -08:00
Christoph Hellwig
89f94b6440 raid5-cache: statically allocate the recovery ra bio
There is no need to preallocate the bio and reset it when use.  Just
allocate it on-stack and use a bvec places next to the pages used for
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:09 -08:00
Christoph Hellwig
0dd00cba99 raid5-cache: fully initialize flush_bio when needed
Stop using bio_reset and just initialize the bio fully when needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:04 -08:00
Christoph Hellwig
9f7c3f837a raid5-ppl: fully initialize the bio in ppl_new_iounit
We have all the information to pass the bdev and op directly to bio_init,
so do that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:00 -08:00
Eric Dumazet
7d959f6e97 md: use msleep() in md_notify_reboot()
Calling mdelay(1000) from process context, even while a reboot
is in progress, does not make sense.

Using msleep() allows other threads to make progress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:20:21 -08:00
Mariusz Tkaczyk
daae161fd2 md: raid1/raid10: drop pending_cnt
Those counters are not necessary after commit 11bb45e8aaf6 ("md: drop queue
limitation for RAID1 and RAID10"). Remove them from all code (conf and
plug structs). raid1_plug_cb and raid10_plug_cb are identical, so move
definition of raid1_plug_cb to common raid1-10 definitions and use it for
RAID10 too.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 15:16:54 -08:00
Christoph Hellwig
c75e707fe1 block: remove the per-bio/request write hint
With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 12:45:57 -07:00
Jens Axboe
b46bebaf2a Merge branch 'for-5.18/drivers' into for-5.18/write-streams
* for-5.18/drivers: (51 commits)
  bcache: fixup multiple threads crash
  bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
  floppy: use memcpy_{to,from}_bvec
  drbd: use bvec_kmap_local in recv_dless_read
  drbd: use bvec_kmap_local in drbd_csum_bio
  bcache: use bvec_kmap_local in bio_csum
  nvdimm-btt: use bvec_kmap_local in btt_rw_integrity
  nvdimm-blk: use bvec_kmap_local in nd_blk_rw_integrity
  zram: use memcpy_from_bvec in zram_bvec_write
  zram: use memcpy_to_bvec in zram_bvec_read
  aoe: use bvec_kmap_local in bvcpy
  iss-simdisk: use bvec_kmap_local in simdisk_submit_bio
  nvme: check that EUI/GUID/UUID are globally unique
  nvme: check for duplicate identifiers earlier
  nvme: fix the check for duplicate unique identifiers
  nvme: cleanup __nvme_check_ids
  nvme: remove nssa from struct nvme_ctrl
  nvme: explicitly set non-error for directives
  nvme: expose cntrltype and dctype through sysfs
  nvme: send uevent on connection up
  ...
2022-03-07 12:44:39 -07:00
Jens Axboe
13400b1454 Merge branch 'for-5.18/block' into for-5.18/write-streams
* for-5.18/block: (96 commits)
  block: remove bio_devname
  ext4: stop using bio_devname
  raid5-ppl: stop using bio_devname
  raid1: stop using bio_devname
  md-multipath: stop using bio_devname
  dm-integrity: stop using bio_devname
  dm-crypt: stop using bio_devname
  pktcdvd: remove a pointless debug check in pkt_submit_bio
  block: remove handle_bad_sector
  block: fix and cleanup bio_check_ro
  bfq: fix use-after-free in bfq_dispatch_request
  blk-crypto: show crypto capabilities in sysfs
  block: don't delete queue kobject before its children
  block: simplify calling convention of elv_unregister_queue()
  block: remove redundant semicolon
  block: default BLOCK_LEGACY_AUTOLOAD to y
  block: update io_ticks when io hang
  block, bfq: don't move oom_bfqq
  block, bfq: avoid moving bfqq to it's parent bfqg
  block, bfq: cleanup bfq_bfqq_to_bfqg()
  ...
2022-03-07 12:44:37 -07:00
Christoph Hellwig
c7dec4623c raid5-ppl: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig
ac483eb375 raid1: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig
ee1925bd83 md-multipath: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig
0a806cfde8 dm-integrity: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Christoph Hellwig
6667171965 dm-crypt: stop using bio_devname
Use the %pg format specifier to save on stack consuption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Mingzhe Zou
887554ab96 bcache: fixup multiple threads crash
When multiple threads to check btree nodes in parallel, the main
thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:

wait_event_interruptible(check_state->wait,
                         atomic_read(&check_state->started) == 0 ||
                         test_bit(CACHE_SET_IO_DISABLE, &c->flags));

However, the bch_btree_node_read and bch_btree_node_read_done
maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
will be set. If the flag already set, the main thread return
error. At the same time, maybe some threads still running and
read NULL pointer, the kernel will crash.

This patch change the event wait condition, the main thread must
wait for all threads to stop.

Fixes: 8e7102273f ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Coly Li <colyli@suse.de>
2022-03-06 22:33:45 +08:00
Mingzhe Zou
7b1002f7cf bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing
When attaching a cached device (a.k.a backing device) to a cache
device, bch_sectors_dirty_init() is called to count dirty sectors
and stripes (see what bcache_dev_sectors_dirty_add() does) on the
cache device.

When bcache_dev_sectors_dirty_add() is called, set_bit(stripe,
d->full_dirty_stripes) or clear_bit(stripe, d->full_dirty_stripes)
operation will always be performed. In full_dirty_stripes, each 1bit
represents stripe_size (8192) sectors (512B), so 1bit=4MB (8192*512),
and each CPU cache line=64B=512bit=2048MB. When 20 threads process
a cached disk with 100G dirty data, a single thread processes about
23M at a time, and 20 threads total 460M. These full_dirty_stripes
bits corresponding to the 460M data is likely to fall in the same CPU
cache line. When one of these threads performs a set_bit or clear_bit
operation, the same CPU cache line of other threads will become invalid
and must read the full_dirty_stripes from the main memory again. Compared
with single thread, the time of a bcache_dev_sectors_dirty_add()
call is increased by about 50 times in our test (100G dirty data,
20 threads, bcache_dev_sectors_dirty_add() is called more than
20 million times).

This patch tries to test_bit before set_bit or clear_bit operation.
Therefore, a lot of force set and clear operations will be avoided,
and most of bcache_dev_sectors_dirty_add() calls will only read CPU
cache line.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
2022-03-06 22:33:37 +08:00
Christoph Hellwig
07fee7aba5 bcache: use bvec_kmap_local in bio_csum
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Mike Snitzer
168678d765 dm mpath: use DMINFO instead of printk with KERN_INFO
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-02 12:20:22 -05:00
Christoph Hellwig
385411ffba dm: stop using bdevname
Just use the %pg format specifier instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-02 12:15:54 -05:00
Christoph Hellwig
977ff73e64 dm-zoned: remove the ->name field in struct dmz_dev
Just use the %pg format specifier to print the block device name
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-02 12:15:35 -05:00
Christoph Hellwig
a773187e37 scsi: dm: Remove WRITE_SAME support
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Christoph Hellwig
10fa225c33 scsi: md: Remove WRITE_SAME support
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Mike Snitzer
f5b4aee10c dm: remove unnecessary local variables in __bind
Also remove empty newline before 'out:' label at end of __bind.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 13:55:53 -05:00
Mike Snitzer
fa247089de dm: requeue IO if mapping table not yet available
Update both bio-based and request-based DM to requeue IO if the
mapping table not available.

This race of IO being submitted before the DM device ready is so
narrow, yet possible for initial table load given that the DM device's
request_queue is created prior, that it best to requeue IO to handle
this unlikely case.

Reported-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 13:55:52 -05:00
Barry Song
a6a4901a5e dm io: remove stale comment block for dm_io()
Commit 7eaceaccab ("block: remove per-queue plugging") dropped
unplug_delay and blk_unplug(). Plus, the current kernel has no
fundamental difference between sync_io() and async_io() except
sync_io() uses sync_io_complete() as the notify.fn and explicitly
calls wait_for_completion_io() to sync. The comment isn't valid
any more.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 13:55:51 -05:00
Zhiqiang Liu
75274a4bf2 dm thin metadata: remove unused dm_thin_remove_block and __remove
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 13:55:50 -05:00
Wang Qing
8ca8b1e147 dm thin: use time_is_before_jiffies instead of open coding it
Use time_is_before_jiffies() to improve code readability.

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 11:30:50 -05:00
Aashish Sharma
6fc5150438 dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
Explicitly convert unsigned int in the right of the conditional
expression to int to match the left side operand and the return type,
fixing the following compiler warning:

drivers/md/dm-crypt.c:2593:43: warning: signed and unsigned
type in conditional expression [-Wsign-compare]

Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Signed-off-by: Aashish Sharma <shraash@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 11:27:56 -05:00
Kirill Tkhai
588b7f5df0 dm: fix use-after-free in dm_cleanup_zoned_dev()
dm_cleanup_zoned_dev() uses queue, so it must be called
before blk_cleanup_disk() starts its killing:

blk_cleanup_disk->blk_cleanup_queue()->kobject_put()->blk_release_queue()->
->...RCU...->blk_free_queue_rcu()->kmem_cache_free()

Otherwise, RCU callback may be executed first and
dm_cleanup_zoned_dev() will touch free'd memory:

 BUG: KASAN: use-after-free in dm_cleanup_zoned_dev+0x33/0xd0
 Read of size 8 at addr ffff88805ac6e430 by task dmsetup/681

 CPU: 4 PID: 681 Comm: dmsetup Not tainted 5.17.0-rc2+ #6
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x7d
  print_address_description.constprop.0+0x1f/0x150
  ? dm_cleanup_zoned_dev+0x33/0xd0
  kasan_report.cold+0x7f/0x11b
  ? dm_cleanup_zoned_dev+0x33/0xd0
  dm_cleanup_zoned_dev+0x33/0xd0
  __dm_destroy+0x26a/0x400
  ? dm_blk_ioctl+0x230/0x230
  ? up_write+0xd8/0x270
  dev_remove+0x156/0x1d0
  ctl_ioctl+0x269/0x530
  ? table_clear+0x140/0x140
  ? lock_release+0xb2/0x750
  ? remove_all+0x40/0x40
  ? rcu_read_lock_sched_held+0x12/0x70
  ? lock_downgrade+0x3c0/0x3c0
  ? rcu_read_lock_sched_held+0x12/0x70
  dm_ctl_ioctl+0xa/0x10
  __x64_sys_ioctl+0xb9/0xf0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fb6dfa95c27

Fixes: bb37d77239 ("dm: introduce zone append emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 11:13:28 -05:00
Jordy Zomer
cd9c88da17 dm ioctl: prevent potential spectre v1 gadget
It appears like cmd could be a Spectre v1 gadget as it's supplied by a
user and used as an array index. Prevent the contents of kernel memory
from being leaked to userspace via speculative execution by using
array_index_nospec.

Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 11:04:41 -05:00