for-5.16/block-2021-10-29

-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8KDgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmQ2D/wO0nH3U+3+OZChi3XUwYck9Dev3o6BANCF
 ClATiK/kivZY0xY1r8J4ixirZo2gcjIMpWSC3JGYZ5LdspfmYGLUbMjfZsaeU23i
 lAKaX1IqfArmHN76k3IU1bKCg7B0/LFwC0q9QTFWTSwNSs8RK/EZLJ61U1hEXUb3
 OfIpaMmvPiMaU7yuPqhcZK14m1cg1srrLM4rFB/PqsWWStF07pHq32WeArGDAU0e
 Fe0YSnYD7qqA5Qc37KwqjCTmmxKX5YZf7etIcA6p3DNmwcuQrVNzKoCH/ZEDijaD
 E2bS/BWbN1x96+rtoEZfBYEaNIrkmJzmW6+fJ53OITbJF3KqP6V66erhqNcFYCzC
 mhFlRe7voXb/8AP7zQqSIhK529BUBM36sQ6nF7EiQcDrfLc1z39mq6eblUxbknIA
 DDPISD5Tseik9N9x0bc7vINseKyHI1E90VAU/XKADcuGbzLvehPx+2p+Iq5ch5Ah
 oa1G3RdlWWQOZxphJHWJhu1qMfo5+FP9dFZj1aoo7b8Kbc/CedyoQe71cpIE5wNh
 Jj/EpWJnuyKXwuTic2VYGC+6ezM9O5DSdqCfP3YuZky95VESyvRCKJYMMgBYRVdC
 /LuxhnBXIY2G8An7ZTnX0kLCCvLbapIwa0NyA98/xeOngO843coJ6wn8ZmE9LJNH
 kMmpCygUrA==
 =QWC+
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - mq-deadline accounting improvements (Bart)

 - blk-wbt timer fix (Andrea)

 - Untangle the block layer includes (Christoph)

 - Rework the poll support to be bio based, which will enable adding
   support for polling for bio based drivers (Christoph)

 - Block layer core support for multi-actuator drives (Damien)

 - blk-crypto improvements (Eric)

 - Batched tag allocation support (me)

 - Request completion batching support (me)

 - Plugging improvements (me)

 - Shared tag set improvements (John)

 - Concurrent queue quiesce support (Ming)

 - Cache bdev in ->private_data for block devices (Pavel)

 - bdev dio improvements (Pavel)

 - Block device invalidation and block size improvements (Xie)

 - Various cleanups, fixes, and improvements (Christoph, Jackie,
   Masahira, Tejun, Yu, Pavel, Zheng, me)

* tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits)
  blk-mq-debugfs: Show active requests per queue for shared tags
  block: improve readability of blk_mq_end_request_batch()
  virtio-blk: Use blk_validate_block_size() to validate block size
  loop: Use blk_validate_block_size() to validate block size
  nbd: Use blk_validate_block_size() to validate block size
  block: Add a helper to validate the block size
  block: re-flow blk_mq_rq_ctx_init()
  block: prefetch request to be initialized
  block: pass in blk_mq_tags to blk_mq_rq_ctx_init()
  block: add rq_flags to struct blk_mq_alloc_data
  block: add async version of bio_set_polled
  block: kill DIO_MULTI_BIO
  block: kill unused polling bits in __blkdev_direct_IO()
  block: avoid extra iter advance with async iocb
  block: Add independent access ranges support
  blk-mq: don't issue request directly in case that current is to be blocked
  sbitmap: silence data race warning
  blk-cgroup: synchronize blkg creation against policy deactivation
  block: refactor bio_iov_bvec_set()
  block: add single bio async direct IO helper
  ...
This commit is contained in:
Linus Torvalds 2021-11-01 09:19:50 -07:00
commit 33c8846c81
185 changed files with 4964 additions and 4043 deletions

View File

@ -7,230 +7,269 @@ Inline Encryption
Background
==========
Inline encryption hardware sits logically between memory and the disk, and can
en/decrypt data as it goes in/out of the disk. Inline encryption hardware has a
fixed number of "keyslots" - slots into which encryption contexts (i.e. the
encryption key, encryption algorithm, data unit size) can be programmed by the
kernel at any time. Each request sent to the disk can be tagged with the index
of a keyslot (and also a data unit number to act as an encryption tweak), and
the inline encryption hardware will en/decrypt the data in the request with the
encryption context programmed into that keyslot. This is very different from
full disk encryption solutions like self encrypting drives/TCG OPAL/ATA
Security standards, since with inline encryption, any block on disk could be
encrypted with any encryption context the kernel chooses.
Inline encryption hardware sits logically between memory and disk, and can
en/decrypt data as it goes in/out of the disk. For each I/O request, software
can control exactly how the inline encryption hardware will en/decrypt the data
in terms of key, algorithm, data unit size (the granularity of en/decryption),
and data unit number (a value that determines the initialization vector(s)).
Some inline encryption hardware accepts all encryption parameters including raw
keys directly in low-level I/O requests. However, most inline encryption
hardware instead has a fixed number of "keyslots" and requires that the key,
algorithm, and data unit size first be programmed into a keyslot. Each
low-level I/O request then just contains a keyslot index and data unit number.
Note that inline encryption hardware is very different from traditional crypto
accelerators, which are supported through the kernel crypto API. Traditional
crypto accelerators operate on memory regions, whereas inline encryption
hardware operates on I/O requests. Thus, inline encryption hardware needs to be
managed by the block layer, not the kernel crypto API.
Inline encryption hardware is also very different from "self-encrypting drives",
such as those based on the TCG Opal or ATA Security standards. Self-encrypting
drives don't provide fine-grained control of encryption and provide no way to
verify the correctness of the resulting ciphertext. Inline encryption hardware
provides fine-grained control of encryption, including the choice of key and
initialization vector for each sector, and can be tested for correctness.
Objective
=========
We want to support inline encryption (IE) in the kernel.
To allow for testing, we also want a crypto API fallback when actual
IE hardware is absent. We also want IE to work with layered devices
like dm and loopback (i.e. we want to be able to use the IE hardware
of the underlying devices if present, or else fall back to crypto API
en/decryption).
We want to support inline encryption in the kernel. To make testing easier, we
also want support for falling back to the kernel crypto API when actual inline
encryption hardware is absent. We also want inline encryption to work with
layered devices like device-mapper and loopback (i.e. we want to be able to use
the inline encryption hardware of the underlying devices if present, or else
fall back to crypto API en/decryption).
Constraints and notes
=====================
- IE hardware has a limited number of "keyslots" that can be programmed
with an encryption context (key, algorithm, data unit size, etc.) at any time.
One can specify a keyslot in a data request made to the device, and the
device will en/decrypt the data using the encryption context programmed into
that specified keyslot. When possible, we want to make multiple requests with
the same encryption context share the same keyslot.
- We need a way for upper layers (e.g. filesystems) to specify an encryption
context to use for en/decrypting a bio, and device drivers (e.g. UFSHCD) need
to be able to use that encryption context when they process the request.
Encryption contexts also introduce constraints on bio merging; the block layer
needs to be aware of these constraints.
- We need a way for upper layers like filesystems to specify an encryption
context to use for en/decrypting a struct bio, and a device driver (like UFS)
needs to be able to use that encryption context when it processes the bio.
- Different inline encryption hardware has different supported algorithms,
supported data unit sizes, maximum data unit numbers, etc. We call these
properties the "crypto capabilities". We need a way for device drivers to
advertise crypto capabilities to upper layers in a generic way.
- We need a way for device drivers to expose their inline encryption
capabilities in a unified way to the upper layers.
- Inline encryption hardware usually (but not always) requires that keys be
programmed into keyslots before being used. Since programming keyslots may be
slow and there may not be very many keyslots, we shouldn't just program the
key for every I/O request, but rather keep track of which keys are in the
keyslots and reuse an already-programmed keyslot when possible.
- Upper layers typically define a specific end-of-life for crypto keys, e.g.
when an encrypted directory is locked or when a crypto mapping is torn down.
At these times, keys are wiped from memory. We must provide a way for upper
layers to also evict keys from any keyslots they are present in.
Design
======
- When possible, device-mapper devices must be able to pass through the inline
encryption support of their underlying devices. However, it doesn't make
sense for device-mapper devices to have keyslots themselves.
We add a struct bio_crypt_ctx to struct bio that can
represent an encryption context, because we need to be able to pass this
encryption context from the upper layers (like the fs layer) to the
device driver to act upon.
Basic design
============
While IE hardware works on the notion of keyslots, the FS layer has no
knowledge of keyslots - it simply wants to specify an encryption context to
use while en/decrypting a bio.
We introduce ``struct blk_crypto_key`` to represent an inline encryption key and
how it will be used. This includes the actual bytes of the key; the size of the
key; the algorithm and data unit size the key will be used with; and the number
of bytes needed to represent the maximum data unit number the key will be used
with.
We introduce a keyslot manager (KSM) that handles the translation from
encryption contexts specified by the FS to keyslots on the IE hardware.
This KSM also serves as the way IE hardware can expose its capabilities to
upper layers. The generic mode of operation is: each device driver that wants
to support IE will construct a KSM and set it up in its struct request_queue.
Upper layers that want to use IE on this device can then use this KSM in
the device's struct request_queue to translate an encryption context into
a keyslot. The presence of the KSM in the request queue shall be used to mean
that the device supports IE.
We introduce ``struct bio_crypt_ctx`` to represent an encryption context. It
contains a data unit number and a pointer to a blk_crypto_key. We add pointers
to a bio_crypt_ctx to ``struct bio`` and ``struct request``; this allows users
of the block layer (e.g. filesystems) to provide an encryption context when
creating a bio and have it be passed down the stack for processing by the block
layer and device drivers. Note that the encryption context doesn't explicitly
say whether to encrypt or decrypt, as that is implicit from the direction of the
bio; WRITE means encrypt, and READ means decrypt.
The KSM uses refcounts to track which keyslots are idle (either they have no
encryption context programmed, or there are no in-flight struct bios
referencing that keyslot). When a new encryption context needs a keyslot, it
tries to find a keyslot that has already been programmed with the same
encryption context, and if there is no such keyslot, it evicts the least
recently used idle keyslot and programs the new encryption context into that
one. If no idle keyslots are available, then the caller will sleep until there
is at least one.
We also introduce ``struct blk_crypto_profile`` to contain all generic inline
encryption-related state for a particular inline encryption device. The
blk_crypto_profile serves as the way that drivers for inline encryption hardware
advertise their crypto capabilities and provide certain functions (e.g.,
functions to program and evict keys) to upper layers. Each device driver that
wants to support inline encryption will construct a blk_crypto_profile, then
associate it with the disk's request_queue.
The blk_crypto_profile also manages the hardware's keyslots, when applicable.
This happens in the block layer, so that users of the block layer can just
specify encryption contexts and don't need to know about keyslots at all, nor do
device drivers need to care about most details of keyslot management.
blk-mq changes, other block layer changes and blk-crypto-fallback
=================================================================
Specifically, for each keyslot, the block layer (via the blk_crypto_profile)
keeps track of which blk_crypto_key that keyslot contains (if any), and how many
in-flight I/O requests are using it. When the block layer creates a
``struct request`` for a bio that has an encryption context, it grabs a keyslot
that already contains the key if possible. Otherwise it waits for an idle
keyslot (a keyslot that isn't in-use by any I/O), then programs the key into the
least-recently-used idle keyslot using the function the device driver provided.
In both cases, the resulting keyslot is stored in the ``crypt_keyslot`` field of
the request, where it is then accessible to device drivers and is released after
the request completes.
We add a pointer to a ``bi_crypt_context`` and ``keyslot`` to
struct request. These will be referred to as the ``crypto fields``
for the request. This ``keyslot`` is the keyslot into which the
``bi_crypt_context`` has been programmed in the KSM of the ``request_queue``
that this request is being sent to.
``struct request`` also contains a pointer to the original bio_crypt_ctx.
Requests can be built from multiple bios, and the block layer must take the
encryption context into account when trying to merge bios and requests. For two
bios/requests to be merged, they must have compatible encryption contexts: both
unencrypted, or both encrypted with the same key and contiguous data unit
numbers. Only the encryption context for the first bio in a request is
retained, since the remaining bios have been verified to be merge-compatible
with the first bio.
We introduce ``block/blk-crypto-fallback.c``, which allows upper layers to remain
blissfully unaware of whether or not real inline encryption hardware is present
underneath. When a bio is submitted with a target ``request_queue`` that doesn't
support the encryption context specified with the bio, the block layer will
en/decrypt the bio with the blk-crypto-fallback.
To make it possible for inline encryption to work with request_queue based
layered devices, when a request is cloned, its encryption context is cloned as
well. When the cloned request is submitted, it is then processed as usual; this
includes getting a keyslot from the clone's target device if needed.
If the bio is a ``WRITE`` bio, a bounce bio is allocated, and the data in the bio
is encrypted stored in the bounce bio - blk-mq will then proceed to process the
bounce bio as if it were not encrypted at all (except when blk-integrity is
concerned). ``blk-crypto-fallback`` sets the bounce bio's ``bi_end_io`` to an
internal function that cleans up the bounce bio and ends the original bio.
blk-crypto-fallback
===================
If the bio is a ``READ`` bio, the bio's ``bi_end_io`` (and also ``bi_private``)
is saved and overwritten by ``blk-crypto-fallback`` to
``bio_crypto_fallback_decrypt_bio``. The bio's ``bi_crypt_context`` is also
overwritten with ``NULL``, so that to the rest of the stack, the bio looks
as if it was a regular bio that never had an encryption context specified.
``bio_crypto_fallback_decrypt_bio`` will decrypt the bio, restore the original
``bi_end_io`` (and also ``bi_private``) and end the bio again.
It is desirable for the inline encryption support of upper layers (e.g.
filesystems) to be testable without real inline encryption hardware, and
likewise for the block layer's keyslot management logic. It is also desirable
to allow upper layers to just always use inline encryption rather than have to
implement encryption in multiple ways.
Regardless of whether real inline encryption hardware is used or the
Therefore, we also introduce *blk-crypto-fallback*, which is an implementation
of inline encryption using the kernel crypto API. blk-crypto-fallback is built
into the block layer, so it works on any block device without any special setup.
Essentially, when a bio with an encryption context is submitted to a
request_queue that doesn't support that encryption context, the block layer will
handle en/decryption of the bio using blk-crypto-fallback.
For encryption, the data cannot be encrypted in-place, as callers usually rely
on it being unmodified. Instead, blk-crypto-fallback allocates bounce pages,
fills a new bio with those bounce pages, encrypts the data into those bounce
pages, and submits that "bounce" bio. When the bounce bio completes,
blk-crypto-fallback completes the original bio. If the original bio is too
large, multiple bounce bios may be required; see the code for details.
For decryption, blk-crypto-fallback "wraps" the bio's completion callback
(``bi_complete``) and private data (``bi_private``) with its own, unsets the
bio's encryption context, then submits the bio. If the read completes
successfully, blk-crypto-fallback restores the bio's original completion
callback and private data, then decrypts the bio's data in-place using the
kernel crypto API. Decryption happens from a workqueue, as it may sleep.
Afterwards, blk-crypto-fallback completes the bio.
In both cases, the bios that blk-crypto-fallback submits no longer have an
encryption context. Therefore, lower layers only see standard unencrypted I/O.
blk-crypto-fallback also defines its own blk_crypto_profile and has its own
"keyslots"; its keyslots contain ``struct crypto_skcipher`` objects. The reason
for this is twofold. First, it allows the keyslot management logic to be tested
without actual inline encryption hardware. Second, similar to actual inline
encryption hardware, the crypto API doesn't accept keys directly in requests but
rather requires that keys be set ahead of time, and setting keys can be
expensive; moreover, allocating a crypto_skcipher can't happen on the I/O path
at all due to the locks it takes. Therefore, the concept of keyslots still
makes sense for blk-crypto-fallback.
Note that regardless of whether real inline encryption hardware or
blk-crypto-fallback is used, the ciphertext written to disk (and hence the
on-disk format of data) will be the same (assuming the hardware's implementation
of the algorithm being used adheres to spec and functions correctly).
If a ``request queue``'s inline encryption hardware claimed to support the
encryption context specified with a bio, then it will not be handled by the
``blk-crypto-fallback``. We will eventually reach a point in blk-mq when a
struct request needs to be allocated for that bio. At that point,
blk-mq tries to program the encryption context into the ``request_queue``'s
keyslot_manager, and obtain a keyslot, which it stores in its newly added
``keyslot`` field. This keyslot is released when the request is completed.
When the first bio is added to a request, ``blk_crypto_rq_bio_prep`` is called,
which sets the request's ``crypt_ctx`` to a copy of the bio's
``bi_crypt_context``. bio_crypt_do_front_merge is called whenever a subsequent
bio is merged to the front of the request, which updates the ``crypt_ctx`` of
the request so that it matches the newly merged bio's ``bi_crypt_context``. In particular, the request keeps a copy of the ``bi_crypt_context`` of the first
bio in its bio-list (blk-mq needs to be careful to maintain this invariant
during bio and request merges).
To make it possible for inline encryption to work with request queue based
layered devices, when a request is cloned, its ``crypto fields`` are cloned as
well. When the cloned request is submitted, blk-mq programs the
``bi_crypt_context`` of the request into the clone's request_queue's keyslot
manager, and stores the returned keyslot in the clone's ``keyslot``.
on-disk format of data) will be the same (assuming that both the inline
encryption hardware's implementation and the kernel crypto API's implementation
of the algorithm being used adhere to spec and function correctly).
blk-crypto-fallback is optional and is controlled by the
``CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK`` kernel configuration option.
API presented to users of the block layer
=========================================
``struct blk_crypto_key`` represents a crypto key (the raw key, size of the
key, the crypto algorithm to use, the data unit size to use, and the number of
bytes required to represent data unit numbers that will be specified with the
``bi_crypt_context``).
``blk_crypto_config_supported()`` allows users to check ahead of time whether
inline encryption with particular crypto settings will work on a particular
request_queue -- either via hardware or via blk-crypto-fallback. This function
takes in a ``struct blk_crypto_config`` which is like blk_crypto_key, but omits
the actual bytes of the key and instead just contains the algorithm, data unit
size, etc. This function can be useful if blk-crypto-fallback is disabled.
``blk_crypto_init_key`` allows upper layers to initialize such a
``blk_crypto_key``.
``blk_crypto_init_key()`` allows users to initialize a blk_crypto_key.
``bio_crypt_set_ctx`` should be called on any bio that a user of
the block layer wants en/decrypted via inline encryption (or the
blk-crypto-fallback, if hardware support isn't available for the desired
crypto configuration). This function takes the ``blk_crypto_key`` and the
data unit number (DUN) to use when en/decrypting the bio.
Users must call ``blk_crypto_start_using_key()`` before actually starting to use
a blk_crypto_key on a request_queue (even if ``blk_crypto_config_supported()``
was called earlier). This is needed to initialize blk-crypto-fallback if it
will be needed. This must not be called from the data path, as this may have to
allocate resources, which may deadlock in that case.
``blk_crypto_config_supported`` allows upper layers to query whether or not the
an encryption context passed to request queue can be handled by blk-crypto
(either by real inline encryption hardware, or by the blk-crypto-fallback).
This is useful e.g. when blk-crypto-fallback is disabled, and the upper layer
wants to use an algorithm that may not supported by hardware - this function
lets the upper layer know ahead of time that the algorithm isn't supported,
and the upper layer can fallback to something else if appropriate.
Next, to attach an encryption context to a bio, users should call
``bio_crypt_set_ctx()``. This function allocates a bio_crypt_ctx and attaches
it to a bio, given the blk_crypto_key and the data unit number that will be used
for en/decryption. Users don't need to worry about freeing the bio_crypt_ctx
later, as that happens automatically when the bio is freed or reset.
``blk_crypto_start_using_key`` - Upper layers must call this function on
``blk_crypto_key`` and a ``request_queue`` before using the key with any bio
headed for that ``request_queue``. This function ensures that either the
hardware supports the key's crypto settings, or the crypto API fallback has
transforms for the needed mode allocated and ready to go. Note that this
function may allocate an ``skcipher``, and must not be called from the data
path, since allocating ``skciphers`` from the data path can deadlock.
Finally, when done using inline encryption with a blk_crypto_key on a
request_queue, users must call ``blk_crypto_evict_key()``. This ensures that
the key is evicted from all keyslots it may be programmed into and unlinked from
any kernel data structures it may be linked into.
``blk_crypto_evict_key`` *must* be called by upper layers before a
``blk_crypto_key`` is freed. Further, it *must* only be called only once
there are no more in-flight requests that use that ``blk_crypto_key``.
``blk_crypto_evict_key`` will ensure that a key is removed from any keyslots in
inline encryption hardware that the key might have been programmed into (or the blk-crypto-fallback).
In summary, for users of the block layer, the lifecycle of a blk_crypto_key is
as follows:
1. ``blk_crypto_config_supported()`` (optional)
2. ``blk_crypto_init_key()``
3. ``blk_crypto_start_using_key()``
4. ``bio_crypt_set_ctx()`` (potentially many times)
5. ``blk_crypto_evict_key()`` (after all I/O has completed)
6. Zeroize the blk_crypto_key (this has no dedicated function)
If a blk_crypto_key is being used on multiple request_queues, then
``blk_crypto_config_supported()`` (if used), ``blk_crypto_start_using_key()``,
and ``blk_crypto_evict_key()`` must be called on each request_queue.
API presented to device drivers
===============================
A :c:type:``struct blk_keyslot_manager`` should be set up by device drivers in
the ``request_queue`` of the device. The device driver needs to call
``blk_ksm_init`` (or its resource-managed variant ``devm_blk_ksm_init``) on the
``blk_keyslot_manager``, while specifying the number of keyslots supported by
the hardware.
A device driver that wants to support inline encryption must set up a
blk_crypto_profile in the request_queue of its device. To do this, it first
must call ``blk_crypto_profile_init()`` (or its resource-managed variant
``devm_blk_crypto_profile_init()``), providing the number of keyslots.
The device driver also needs to tell the KSM how to actually manipulate the
IE hardware in the device to do things like programming the crypto key into
the IE hardware into a particular keyslot. All this is achieved through the
struct blk_ksm_ll_ops field in the KSM that the device driver
must fill up after initing the ``blk_keyslot_manager``.
Next, it must advertise its crypto capabilities by setting fields in the
blk_crypto_profile, e.g. ``modes_supported`` and ``max_dun_bytes_supported``.
The KSM also handles runtime power management for the device when applicable
(e.g. when it wants to program a crypto key into the IE hardware, the device
must be runtime powered on) - so the device driver must also set the ``dev``
field in the ksm to point to the `struct device` for the KSM to use for runtime
power management.
It then must set function pointers in the ``ll_ops`` field of the
blk_crypto_profile to tell upper layers how to control the inline encryption
hardware, e.g. how to program and evict keyslots. Most drivers will need to
implement ``keyslot_program`` and ``keyslot_evict``. For details, see the
comments for ``struct blk_crypto_ll_ops``.
``blk_ksm_reprogram_all_keys`` can be called by device drivers if the device
needs each and every of its keyslots to be reprogrammed with the key it
"should have" at the point in time when the function is called. This is useful
e.g. if a device loses all its keys on runtime power down/up.
Once the driver registers a blk_crypto_profile with a request_queue, I/O
requests the driver receives via that queue may have an encryption context. All
encryption contexts will be compatible with the crypto capabilities declared in
the blk_crypto_profile, so drivers don't need to worry about handling
unsupported requests. Also, if a nonzero number of keyslots was declared in the
blk_crypto_profile, then all I/O requests that have an encryption context will
also have a keyslot which was already programmed with the appropriate key.
If the driver used ``blk_ksm_init`` instead of ``devm_blk_ksm_init``, then
``blk_ksm_destroy`` should be called to free up all resources used by a
``blk_keyslot_manager`` once it is no longer needed.
If the driver implements runtime suspend and its blk_crypto_ll_ops don't work
while the device is runtime-suspended, then the driver must also set the ``dev``
field of the blk_crypto_profile to point to the ``struct device`` that will be
resumed before any of the low-level operations are called.
If there are situations where the inline encryption hardware loses the contents
of its keyslots, e.g. device resets, the driver must handle reprogramming the
keyslots. To do this, the driver may call ``blk_crypto_reprogram_all_keys()``.
Finally, if the driver used ``blk_crypto_profile_init()`` instead of
``devm_blk_crypto_profile_init()``, then it is responsible for calling
``blk_crypto_profile_destroy()`` when the crypto profile is no longer needed.
Layered Devices
===============
Request queue based layered devices like dm-rq that wish to support IE need to
create their own keyslot manager for their request queue, and expose whatever
functionality they choose. When a layered device wants to pass a clone of that
request to another ``request_queue``, blk-crypto will initialize and prepare the
clone as necessary - see ``blk_crypto_insert_cloned_request`` in
``blk-crypto.c``.
Future Optimizations for layered devices
========================================
Creating a keyslot manager for a layered device uses up memory for each
keyslot, and in general, a layered device merely passes the request on to a
"child" device, so the keyslots in the layered device itself are completely
unused, and don't need any refcounting or keyslot programming. We can instead
define a new type of KSM; the "passthrough KSM", that layered devices can use
to advertise an unlimited number of keyslots, and support for any encryption
algorithms they choose, while not actually using any memory for each keyslot.
Another use case for the "passthrough KSM" is for IE devices that do not have a
limited number of keyslots.
Request queue based layered devices like dm-rq that wish to support inline
encryption need to create their own blk_crypto_profile for their request_queue,
and expose whatever functionality they choose. When a layered device wants to
pass a clone of that request to another request_queue, blk-crypto will
initialize and prepare the clone as necessary; see
``blk_crypto_insert_cloned_request()``.
Interaction between inline encryption and blk integrity
=======================================================
@ -257,7 +296,7 @@ Because there isn't any real hardware yet, it seems prudent to assume that
hardware implementations might not implement both features together correctly,
and disallow the combination for now. Whenever a device supports integrity, the
kernel will pretend that the device does not support hardware inline encryption
(by essentially setting the keyslot manager in the request_queue of the device
to NULL). When the crypto API fallback is enabled, this means that all bios with
and encryption context will use the fallback, and IO will complete as usual.
When the fallback is disabled, a bio with an encryption context will be failed.
(by setting the blk_crypto_profile in the request_queue of the device to NULL).
When the crypto API fallback is enabled, this means that all bios with and
encryption context will use the fallback, and IO will complete as usual. When
the fallback is disabled, a bio with an encryption context will be failed.

View File

@ -1115,7 +1115,8 @@ export MODORDER := $(extmod_prefix)modules.order
export MODULES_NSDEPS := $(extmod_prefix)modules.nsdeps
ifeq ($(KBUILD_EXTMOD),)
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/
core-$(CONFIG_BLOCK) += block/
vmlinux-dirs := $(patsubst %/,%,$(filter %/, \
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \

View File

@ -58,7 +58,7 @@ struct nfhd_device {
struct gendisk *disk;
};
static blk_qc_t nfhd_submit_bio(struct bio *bio)
static void nfhd_submit_bio(struct bio *bio)
{
struct nfhd_device *dev = bio->bi_bdev->bd_disk->private_data;
struct bio_vec bvec;
@ -76,7 +76,6 @@ static blk_qc_t nfhd_submit_bio(struct bio *bio)
sec += len;
}
bio_endio(bio);
return BLK_QC_T_NONE;
}
static int nfhd_getgeo(struct block_device *bdev, struct hd_geometry *geo)

View File

@ -16,7 +16,6 @@
#include <linux/console.h>
#include <linux/memblock.h>
#include <linux/ioport.h>
#include <linux/blkdev.h>
#include <asm/bootinfo.h>
#include <asm/mach-rc32434/ddr.h>

View File

@ -7,7 +7,6 @@
#include <linux/kernel.h>
#include <linux/linkage.h>
#include <linux/mm.h>
#include <linux/blkdev.h>
#include <linux/memblock.h>
#include <linux/pm.h>
#include <linux/smp.h>

View File

@ -11,7 +11,6 @@
#include <linux/spinlock.h>
#include <linux/mm.h>
#include <linux/memblock.h>
#include <linux/blkdev.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/screen_info.h>

View File

@ -25,7 +25,6 @@
#include <linux/memblock.h>
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/blkdev.h> /* for initrd_* */
#include <linux/pagemap.h>
#include <asm/pgalloc.h>

View File

@ -21,6 +21,7 @@
#include <linux/namei.h>
#include <linux/pagemap.h>
#include <linux/poll.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
#include <asm/prom.h>

View File

@ -27,6 +27,7 @@
#include <linux/blk-mq.h>
#include <linux/ata.h>
#include <linux/hdreg.h>
#include <linux/major.h>
#include <linux/cdrom.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>

View File

@ -100,7 +100,7 @@ static void simdisk_transfer(struct simdisk *dev, unsigned long sector,
spin_unlock(&dev->lock);
}
static blk_qc_t simdisk_submit_bio(struct bio *bio)
static void simdisk_submit_bio(struct bio *bio)
{
struct simdisk *dev = bio->bi_bdev->bd_disk->private_data;
struct bio_vec bvec;
@ -118,7 +118,6 @@ static blk_qc_t simdisk_submit_bio(struct bio *bio)
}
bio_endio(bio);
return BLK_QC_T_NONE;
}
static int simdisk_open(struct block_device *bdev, fmode_t mode)

View File

@ -73,7 +73,7 @@ config BLK_DEV_ZONED
config BLK_DEV_THROTTLING
bool "Block layer bio throttling support"
depends on BLK_CGROUP=y
depends on BLK_CGROUP
select BLK_CGROUP_RWSTAT
help
Block layer bio throttling support. It can be used to limit
@ -112,7 +112,7 @@ config BLK_WBT_MQ
config BLK_CGROUP_IOLATENCY
bool "Enable support for latency based cgroup IO protection"
depends on BLK_CGROUP=y
depends on BLK_CGROUP
help
Enabling this option enables the .latency interface for IO throttling.
The IO controller will attempt to maintain average IO latencies below
@ -132,7 +132,7 @@ config BLK_CGROUP_FC_APPID
config BLK_CGROUP_IOCOST
bool "Enable support for cost model based cgroup IO controller"
depends on BLK_CGROUP=y
depends on BLK_CGROUP
select BLK_RQ_IO_DATA_LEN
select BLK_RQ_ALLOC_TIME
help
@ -190,39 +190,31 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
by falling back to the kernel crypto API when inline
encryption hardware is not present.
menu "Partition Types"
source "block/partitions/Kconfig"
endmenu
endif # BLOCK
config BLOCK_COMPAT
bool
depends on BLOCK && COMPAT
default y
def_bool COMPAT
config BLK_MQ_PCI
bool
depends on BLOCK && PCI
default y
def_bool PCI
config BLK_MQ_VIRTIO
bool
depends on BLOCK && VIRTIO
depends on VIRTIO
default y
config BLK_MQ_RDMA
bool
depends on BLOCK && INFINIBAND
depends on INFINIBAND
default y
config BLK_PM
def_bool BLOCK && PM
def_bool PM
# do not use in new code
config BLOCK_HOLDER_DEPRECATED
bool
source "block/Kconfig.iosched"
endif # BLOCK

View File

@ -1,6 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
if BLOCK
menu "IO Schedulers"
config MQ_IOSCHED_DEADLINE
@ -45,5 +43,3 @@ config BFQ_CGROUP_DEBUG
files in a cgroup which can be useful for debugging.
endmenu
endif

View File

@ -3,13 +3,13 @@
# Makefile for the kernel block layer
#
obj-$(CONFIG_BLOCK) := bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
obj-y := bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-timeout.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
disk-events.o
disk-events.o blk-ia-ranges.o
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_BLK_DEV_BSG_COMMON) += bsg.o
@ -36,6 +36,6 @@ obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o
obj-$(CONFIG_BLK_PM) += blk-pm.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += keyslot-manager.o blk-crypto.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED) += holder.o

View File

@ -12,6 +12,7 @@
#include <linux/major.h>
#include <linux/device_cgroup.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/backing-dev.h>
#include <linux/module.h>
#include <linux/blkpg.h>
@ -326,12 +327,12 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
if (!ops->rw_page || bdev_get_integrity(bdev))
return result;
result = blk_queue_enter(bdev->bd_disk->queue, 0);
result = blk_queue_enter(bdev_get_queue(bdev), 0);
if (result)
return result;
result = ops->rw_page(bdev, sector + get_start_sect(bdev), page,
REQ_OP_READ);
blk_queue_exit(bdev->bd_disk->queue);
blk_queue_exit(bdev_get_queue(bdev));
return result;
}
@ -362,7 +363,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
if (!ops->rw_page || bdev_get_integrity(bdev))
return -EOPNOTSUPP;
result = blk_queue_enter(bdev->bd_disk->queue, 0);
result = blk_queue_enter(bdev_get_queue(bdev), 0);
if (result)
return result;
@ -375,7 +376,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
clean_page_buffers(page);
unlock_page(page);
}
blk_queue_exit(bdev->bd_disk->queue);
blk_queue_exit(bdev_get_queue(bdev));
return result;
}
@ -492,6 +493,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
spin_lock_init(&bdev->bd_size_lock);
bdev->bd_partno = partno;
bdev->bd_inode = inode;
bdev->bd_queue = disk->queue;
bdev->bd_stats = alloc_percpu(struct disk_stats);
if (!bdev->bd_stats) {
iput(inode);
@ -962,9 +964,11 @@ EXPORT_SYMBOL(blkdev_put);
* @pathname: special file representing the block device
* @dev: return value of the block device's dev_t
*
* Get a reference to the blockdevice at @pathname in the current
* namespace if possible and return it. Return ERR_PTR(error)
* otherwise.
* Lookup the block device's dev_t at @pathname in the current
* namespace if possible and return it by @dev.
*
* RETURNS:
* 0 if succeeded, errno otherwise.
*/
int lookup_bdev(const char *pathname, dev_t *dev)
{

View File

@ -6,13 +6,13 @@
#include <linux/slab.h>
#include <linux/blkdev.h>
#include <linux/cgroup.h>
#include <linux/elevator.h>
#include <linux/ktime.h>
#include <linux/rbtree.h>
#include <linux/ioprio.h>
#include <linux/sbitmap.h>
#include <linux/delay.h>
#include "elevator.h"
#include "bfq-iosched.h"
#ifdef CONFIG_BFQ_CGROUP_DEBUG
@ -463,7 +463,7 @@ static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
{
if (blkg_rwstat_init(&stats->bytes, gfp) ||
blkg_rwstat_init(&stats->ios, gfp))
return -ENOMEM;
goto error;
#ifdef CONFIG_BFQ_CGROUP_DEBUG
if (blkg_rwstat_init(&stats->merged, gfp) ||
@ -476,13 +476,15 @@ static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
bfq_stat_init(&stats->dequeue, gfp) ||
bfq_stat_init(&stats->group_wait_time, gfp) ||
bfq_stat_init(&stats->idle_time, gfp) ||
bfq_stat_init(&stats->empty_time, gfp)) {
bfqg_stats_exit(stats);
return -ENOMEM;
}
bfq_stat_init(&stats->empty_time, gfp))
goto error;
#endif
return 0;
error:
bfqg_stats_exit(stats);
return -ENOMEM;
}
static struct bfq_group_data *cpd_to_bfqgd(struct blkcg_policy_data *cpd)

View File

@ -117,7 +117,6 @@
#include <linux/slab.h>
#include <linux/blkdev.h>
#include <linux/cgroup.h>
#include <linux/elevator.h>
#include <linux/ktime.h>
#include <linux/rbtree.h>
#include <linux/ioprio.h>
@ -127,6 +126,7 @@
#include <trace/events/block.h>
#include "elevator.h"
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
@ -6884,8 +6884,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
struct blk_mq_tags *tags = hctx->sched_tags;
unsigned int min_shallow;
min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
}
static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)

View File

@ -6,7 +6,7 @@
* Written by: Martin K. Petersen <martin.petersen@oracle.com>
*/
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/mempool.h>
#include <linux/export.h>
#include <linux/bio.h>
@ -134,7 +134,7 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
iv = bip->bip_vec + bip->bip_vcnt;
if (bip->bip_vcnt &&
bvec_gap_to_prev(bio->bi_bdev->bd_disk->queue,
bvec_gap_to_prev(bdev_get_queue(bio->bi_bdev),
&bip->bip_vec[bip->bip_vcnt - 1], offset))
return 0;

View File

@ -87,7 +87,8 @@ static struct bio_slab *create_bio_slab(unsigned int size)
snprintf(bslab->name, sizeof(bslab->name), "bio-%d", size);
bslab->slab = kmem_cache_create(bslab->name, size,
ARCH_KMALLOC_MINALIGN, SLAB_HWCACHE_ALIGN, NULL);
ARCH_KMALLOC_MINALIGN,
SLAB_HWCACHE_ALIGN | SLAB_TYPESAFE_BY_RCU, NULL);
if (!bslab->slab)
goto fail_alloc_slab;
@ -156,7 +157,7 @@ out:
void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs)
{
BIO_BUG_ON(nr_vecs > BIO_MAX_VECS);
BUG_ON(nr_vecs > BIO_MAX_VECS);
if (nr_vecs == BIO_MAX_VECS)
mempool_free(bv, pool);
@ -281,6 +282,7 @@ void bio_init(struct bio *bio, struct bio_vec *table,
atomic_set(&bio->__bi_remaining, 1);
atomic_set(&bio->__bi_cnt, 1);
bio->bi_cookie = BLK_QC_T_NONE;
bio->bi_max_vecs = max_vecs;
bio->bi_io_vec = table;
@ -546,7 +548,7 @@ EXPORT_SYMBOL(zero_fill_bio);
* REQ_OP_READ, zero the truncated part. This function should only
* be used for handling corner cases, such as bio eod.
*/
void bio_truncate(struct bio *bio, unsigned new_size)
static void bio_truncate(struct bio *bio, unsigned new_size)
{
struct bio_vec bv;
struct bvec_iter iter;
@ -677,7 +679,7 @@ static void bio_alloc_cache_destroy(struct bio_set *bs)
void bio_put(struct bio *bio)
{
if (unlikely(bio_flagged(bio, BIO_REFFED))) {
BIO_BUG_ON(!atomic_read(&bio->__bi_cnt));
BUG_ON(!atomic_read(&bio->__bi_cnt));
if (!atomic_dec_and_test(&bio->__bi_cnt))
return;
}
@ -772,6 +774,23 @@ const char *bio_devname(struct bio *bio, char *buf)
}
EXPORT_SYMBOL(bio_devname);
/**
* bio_full - check if the bio is full
* @bio: bio to check
* @len: length of one segment to be added
*
* Return true if @bio is full and one segment with @len bytes can't be
* added to the bio, otherwise return false
*/
static inline bool bio_full(struct bio *bio, unsigned len)
{
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
if (bio->bi_iter.bi_size > UINT_MAX - len)
return true;
return false;
}
static inline bool page_is_mergeable(const struct bio_vec *bv,
struct page *page, unsigned int len, unsigned int off,
bool *same_page)
@ -791,6 +810,44 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
}
/**
* __bio_try_merge_page - try appending data to an existing bvec.
* @bio: destination bio
* @page: start page to add
* @len: length of the data to add
* @off: offset of the data relative to @page
* @same_page: return if the segment has been merged inside the same page
*
* Try to add the data at @page + @off to the last bvec of @bio. This is a
* useful optimisation for file systems with a block size smaller than the
* page size.
*
* Warn if (@len, @off) crosses pages in case that @same_page is true.
*
* Return %true on success or %false on failure.
*/
static bool __bio_try_merge_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int off, bool *same_page)
{
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
return false;
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
if (page_is_mergeable(bv, page, len, off, same_page)) {
if (bio->bi_iter.bi_size > UINT_MAX - len) {
*same_page = false;
return false;
}
bv->bv_len += len;
bio->bi_iter.bi_size += len;
return true;
}
}
return false;
}
/*
* Try to merge a page into a segment, while obeying the hardware segment
* size limit. This is not for normal read/write bios, but for passthrough
@ -908,7 +965,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
int bio_add_zone_append_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int offset)
{
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
bool same_page = false;
if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND))
@ -922,45 +979,6 @@ int bio_add_zone_append_page(struct bio *bio, struct page *page,
}
EXPORT_SYMBOL_GPL(bio_add_zone_append_page);
/**
* __bio_try_merge_page - try appending data to an existing bvec.
* @bio: destination bio
* @page: start page to add
* @len: length of the data to add
* @off: offset of the data relative to @page
* @same_page: return if the segment has been merged inside the same page
*
* Try to add the data at @page + @off to the last bvec of @bio. This is a
* useful optimisation for file systems with a block size smaller than the
* page size.
*
* Warn if (@len, @off) crosses pages in case that @same_page is true.
*
* Return %true on success or %false on failure.
*/
bool __bio_try_merge_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int off, bool *same_page)
{
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
return false;
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
if (page_is_mergeable(bv, page, len, off, same_page)) {
if (bio->bi_iter.bi_size > UINT_MAX - len) {
*same_page = false;
return false;
}
bv->bv_len += len;
bio->bi_iter.bi_size += len;
return true;
}
}
return false;
}
EXPORT_SYMBOL_GPL(__bio_try_merge_page);
/**
* __bio_add_page - add page(s) to a bio in a new segment
* @bio: destination bio
@ -1015,52 +1033,40 @@ int bio_add_page(struct bio *bio, struct page *page,
}
EXPORT_SYMBOL(bio_add_page);
void bio_release_pages(struct bio *bio, bool mark_dirty)
void __bio_release_pages(struct bio *bio, bool mark_dirty)
{
struct bvec_iter_all iter_all;
struct bio_vec *bvec;
if (bio_flagged(bio, BIO_NO_PAGE_REF))
return;
bio_for_each_segment_all(bvec, bio, iter_all) {
if (mark_dirty && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
put_page(bvec->bv_page);
}
}
EXPORT_SYMBOL_GPL(bio_release_pages);
EXPORT_SYMBOL_GPL(__bio_release_pages);
static void __bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
{
size_t size = iov_iter_count(iter);
WARN_ON_ONCE(bio->bi_max_vecs);
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
size_t max_sectors = queue_max_zone_append_sectors(q);
size = min(size, max_sectors << SECTOR_SHIFT);
}
bio->bi_vcnt = iter->nr_segs;
bio->bi_io_vec = (struct bio_vec *)iter->bvec;
bio->bi_iter.bi_bvec_done = iter->iov_offset;
bio->bi_iter.bi_size = iter->count;
bio->bi_iter.bi_size = size;
bio_set_flag(bio, BIO_NO_PAGE_REF);
bio_set_flag(bio, BIO_CLONED);
}
static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
{
__bio_iov_bvec_set(bio, iter);
iov_iter_advance(iter, iter->count);
return 0;
}
static int bio_iov_bvec_set_append(struct bio *bio, struct iov_iter *iter)
{
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct iov_iter i = *iter;
iov_iter_truncate(&i, queue_max_zone_append_sectors(q) << 9);
__bio_iov_bvec_set(bio, &i);
iov_iter_advance(iter, i.count);
return 0;
}
static void bio_put_pages(struct page **pages, size_t size, size_t off)
{
size_t i, nr = DIV_ROUND_UP(size + (off & ~PAGE_MASK), PAGE_SIZE);
@ -1130,7 +1136,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
{
unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
@ -1202,9 +1208,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
int ret = 0;
if (iov_iter_is_bvec(iter)) {
if (bio_op(bio) == REQ_OP_ZONE_APPEND)
return bio_iov_bvec_set_append(bio, iter);
return bio_iov_bvec_set(bio, iter);
bio_iov_bvec_set(bio, iter);
iov_iter_advance(iter, bio->bi_iter.bi_size);
return 0;
}
do {
@ -1260,18 +1266,7 @@ int submit_bio_wait(struct bio *bio)
}
EXPORT_SYMBOL(submit_bio_wait);
/**
* bio_advance - increment/complete a bio by some number of bytes
* @bio: bio to advance
* @bytes: number of bytes to complete
*
* This updates bi_sector, bi_size and bi_idx; if the number of bytes to
* complete doesn't align with a bvec boundary, then bv_len and bv_offset will
* be updated on the last bvec as well.
*
* @bio will then represent the remaining, uncompleted portion of the io.
*/
void bio_advance(struct bio *bio, unsigned bytes)
void __bio_advance(struct bio *bio, unsigned bytes)
{
if (bio_integrity(bio))
bio_integrity_advance(bio, bytes);
@ -1279,7 +1274,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
bio_crypt_advance(bio, bytes);
bio_advance_iter(bio, &bio->bi_iter, bytes);
}
EXPORT_SYMBOL(bio_advance);
EXPORT_SYMBOL(__bio_advance);
void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
struct bio *src, struct bvec_iter *src_iter)
@ -1467,10 +1462,10 @@ again:
return;
if (bio->bi_bdev && bio_flagged(bio, BIO_TRACKED))
rq_qos_done_bio(bio->bi_bdev->bd_disk->queue, bio);
rq_qos_done_bio(bdev_get_queue(bio->bi_bdev), bio);
if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio);
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
}

View File

@ -32,6 +32,7 @@
#include <linux/psi.h>
#include "blk.h"
#include "blk-ioprio.h"
#include "blk-throttle.h"
/*
* blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
@ -620,7 +621,7 @@ struct block_device *blkcg_conf_open_bdev(char **inputp)
*/
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
char *input, struct blkg_conf_ctx *ctx)
__acquires(rcu) __acquires(&bdev->bd_disk->queue->queue_lock)
__acquires(rcu) __acquires(&bdev->bd_queue->queue_lock)
{
struct block_device *bdev;
struct request_queue *q;
@ -631,7 +632,15 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
if (IS_ERR(bdev))
return PTR_ERR(bdev);
q = bdev->bd_disk->queue;
q = bdev_get_queue(bdev);
/*
* blkcg_deactivate_policy() requires queue to be frozen, we can grab
* q_usage_counter to prevent concurrent with blkcg_deactivate_policy().
*/
ret = blk_queue_enter(q, 0);
if (ret)
return ret;
rcu_read_lock();
spin_lock_irq(&q->queue_lock);
@ -702,6 +711,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
goto success;
}
success:
blk_queue_exit(q);
ctx->bdev = bdev;
ctx->blkg = blkg;
ctx->body = input;
@ -714,6 +724,7 @@ fail_unlock:
rcu_read_unlock();
fail:
blkdev_put_no_open(bdev);
blk_queue_exit(q);
/*
* If queue was bypassing, we should retry. Do so after a
* short msleep(). It isn't strictly necessary but queue
@ -736,9 +747,9 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
* with blkg_conf_prep().
*/
void blkg_conf_finish(struct blkg_conf_ctx *ctx)
__releases(&ctx->bdev->bd_disk->queue->queue_lock) __releases(rcu)
__releases(&ctx->bdev->bd_queue->queue_lock) __releases(rcu)
{
spin_unlock_irq(&ctx->bdev->bd_disk->queue->queue_lock);
spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock);
rcu_read_unlock();
blkdev_put_no_open(ctx->bdev);
}
@ -841,7 +852,7 @@ static void blkcg_fill_root_iostats(void)
while ((dev = class_dev_iter_next(&iter))) {
struct block_device *bdev = dev_to_bdev(dev);
struct blkcg_gq *blkg =
blk_queue_root_blkg(bdev->bd_disk->queue);
blk_queue_root_blkg(bdev_get_queue(bdev));
struct blkg_iostat tmp;
int cpu;
@ -1800,7 +1811,7 @@ static inline struct blkcg_gq *blkg_tryget_closest(struct bio *bio,
rcu_read_lock();
blkg = blkg_lookup_create(css_to_blkcg(css),
bio->bi_bdev->bd_disk->queue);
bdev_get_queue(bio->bi_bdev));
while (blkg) {
if (blkg_tryget(blkg)) {
ret_blkg = blkg;
@ -1836,8 +1847,8 @@ void bio_associate_blkg_from_css(struct bio *bio,
if (css && css->parent) {
bio->bi_blkg = blkg_tryget_closest(bio, css);
} else {
blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg);
bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg;
blkg_get(bdev_get_queue(bio->bi_bdev)->root_blkg);
bio->bi_blkg = bdev_get_queue(bio->bi_bdev)->root_blkg;
}
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);

View File

@ -18,6 +18,7 @@
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/blk-pm.h>
#include <linux/blk-integrity.h>
#include <linux/highmem.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
@ -49,6 +50,7 @@
#include "blk-mq.h"
#include "blk-mq-sched.h"
#include "blk-pm.h"
#include "blk-throttle.h"
struct dentry *blk_debugfs_root;
@ -214,8 +216,7 @@ int blk_status_to_errno(blk_status_t status)
}
EXPORT_SYMBOL_GPL(blk_status_to_errno);
static void print_req_error(struct request *req, blk_status_t status,
const char *caller)
void blk_print_req_error(struct request *req, blk_status_t status)
{
int idx = (__force int)status;
@ -223,9 +224,9 @@ static void print_req_error(struct request *req, blk_status_t status,
return;
printk_ratelimited(KERN_ERR
"%s: %s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
"%s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
"phys_seg %u prio class %u\n",
caller, blk_errors[idx].name,
blk_errors[idx].name,
req->rq_disk ? req->rq_disk->disk_name : "?",
blk_rq_pos(req), req_op(req), blk_op_str(req_op(req)),
req->cmd_flags & ~REQ_OP_MASK,
@ -233,33 +234,6 @@ static void print_req_error(struct request *req, blk_status_t status,
IOPRIO_PRIO_CLASS(req->ioprio));
}
static void req_bio_endio(struct request *rq, struct bio *bio,
unsigned int nbytes, blk_status_t error)
{
if (error)
bio->bi_status = error;
if (unlikely(rq->rq_flags & RQF_QUIET))
bio_set_flag(bio, BIO_QUIET);
bio_advance(bio, nbytes);
if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
/*
* Partial zone append completions cannot be supported as the
* BIO fragments may end up not being written sequentially.
*/
if (bio->bi_iter.bi_size)
bio->bi_status = BLK_STS_IOERR;
else
bio->bi_iter.bi_sector = rq->__sector;
}
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
bio_endio(bio);
}
void blk_dump_rq_flags(struct request *rq, char *msg)
{
printk(KERN_INFO "%s: dev %s: flags=%llx\n", msg,
@ -402,7 +376,7 @@ void blk_cleanup_queue(struct request_queue *q)
*/
mutex_lock(&q->sysfs_lock);
if (q->elevator)
blk_mq_sched_free_requests(q);
blk_mq_sched_free_rqs(q);
mutex_unlock(&q->sysfs_lock);
percpu_ref_exit(&q->q_usage_counter);
@ -415,7 +389,7 @@ EXPORT_SYMBOL(blk_cleanup_queue);
static bool blk_try_enter_queue(struct request_queue *q, bool pm)
{
rcu_read_lock();
if (!percpu_ref_tryget_live(&q->q_usage_counter))
if (!percpu_ref_tryget_live_rcu(&q->q_usage_counter))
goto fail;
/*
@ -430,7 +404,7 @@ static bool blk_try_enter_queue(struct request_queue *q, bool pm)
return true;
fail_put:
percpu_ref_put(&q->q_usage_counter);
blk_queue_exit(q);
fail:
rcu_read_unlock();
return false;
@ -470,10 +444,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
static inline int bio_queue_enter(struct bio *bio)
{
struct gendisk *disk = bio->bi_bdev->bd_disk;
struct request_queue *q = disk->queue;
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
while (!blk_try_enter_queue(q, false)) {
struct gendisk *disk = bio->bi_bdev->bd_disk;
if (bio->bi_opf & REQ_NOWAIT) {
if (test_bit(GD_DEAD, &disk->state))
goto dead;
@ -553,7 +528,7 @@ struct request_queue *blk_alloc_queue(int node_id)
q->node = node_id;
atomic_set(&q->nr_active_requests_shared_sbitmap, 0);
atomic_set(&q->nr_active_requests_shared_tags, 0);
timer_setup(&q->timeout, blk_rq_timed_out_timer, 0);
INIT_WORK(&q->timeout_work, blk_timeout_work);
@ -586,7 +561,7 @@ struct request_queue *blk_alloc_queue(int node_id)
blk_queue_dma_alignment(q, 511);
blk_set_default_limits(&q->limits);
q->nr_requests = BLKDEV_MAX_RQ;
q->nr_requests = BLKDEV_DEFAULT_RQ;
return q;
@ -654,8 +629,9 @@ static void handle_bad_sector(struct bio *bio, sector_t maxsector)
{
char b[BDEVNAME_SIZE];
pr_info_ratelimited("attempt to access beyond end of device\n"
pr_info_ratelimited("%s: attempt to access beyond end of device\n"
"%s: rw=%d, want=%llu, limit=%llu\n",
current->comm,
bio_devname(bio, b), bio->bi_opf,
bio_end_sector(bio), maxsector);
}
@ -797,7 +773,7 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
static noinline_for_stack bool submit_bio_checks(struct bio *bio)
{
struct block_device *bdev = bio->bi_bdev;
struct request_queue *q = bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bdev);
blk_status_t status = BLK_STS_IOERR;
struct blk_plug *plug;
@ -839,7 +815,7 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
}
if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
bio_clear_hipri(bio);
bio_clear_polled(bio);
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
@ -912,25 +888,22 @@ end_io:
return false;
}
static blk_qc_t __submit_bio(struct bio *bio)
static void __submit_bio(struct bio *bio)
{
struct gendisk *disk = bio->bi_bdev->bd_disk;
blk_qc_t ret = BLK_QC_T_NONE;
if (unlikely(bio_queue_enter(bio) != 0))
return BLK_QC_T_NONE;
return;
if (!submit_bio_checks(bio) || !blk_crypto_bio_prep(&bio))
goto queue_exit;
if (disk->fops->submit_bio) {
ret = disk->fops->submit_bio(bio);
goto queue_exit;
if (!disk->fops->submit_bio) {
blk_mq_submit_bio(bio);
return;
}
return blk_mq_submit_bio(bio);
disk->fops->submit_bio(bio);
queue_exit:
blk_queue_exit(disk->queue);
return ret;
}
/*
@ -952,10 +925,9 @@ queue_exit:
* bio_list_on_stack[1] contains bios that were submitted before the current
* ->submit_bio_bio, but that haven't been processed yet.
*/
static blk_qc_t __submit_bio_noacct(struct bio *bio)
static void __submit_bio_noacct(struct bio *bio)
{
struct bio_list bio_list_on_stack[2];
blk_qc_t ret = BLK_QC_T_NONE;
BUG_ON(bio->bi_next);
@ -963,7 +935,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
current->bio_list = bio_list_on_stack;
do {
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
struct bio_list lower, same;
/*
@ -972,7 +944,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
bio_list_on_stack[1] = bio_list_on_stack[0];
bio_list_init(&bio_list_on_stack[0]);
ret = __submit_bio(bio);
__submit_bio(bio);
/*
* Sort new bios into those for a lower level and those for the
@ -981,7 +953,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
bio_list_init(&lower);
bio_list_init(&same);
while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
if (q == bio->bi_bdev->bd_disk->queue)
if (q == bdev_get_queue(bio->bi_bdev))
bio_list_add(&same, bio);
else
bio_list_add(&lower, bio);
@ -995,22 +967,19 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
} while ((bio = bio_list_pop(&bio_list_on_stack[0])));
current->bio_list = NULL;
return ret;
}
static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
static void __submit_bio_noacct_mq(struct bio *bio)
{
struct bio_list bio_list[2] = { };
blk_qc_t ret;
current->bio_list = bio_list;
do {
ret = __submit_bio(bio);
__submit_bio(bio);
} while ((bio = bio_list_pop(&bio_list[0])));
current->bio_list = NULL;
return ret;
}
/**
@ -1022,7 +991,7 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
* systems and other upper level users of the block layer should use
* submit_bio() instead.
*/
blk_qc_t submit_bio_noacct(struct bio *bio)
void submit_bio_noacct(struct bio *bio)
{
/*
* We only want one ->submit_bio to be active at a time, else stack
@ -1030,14 +999,12 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
* to collect a list of requests submited by a ->submit_bio method while
* it is active, and then process them after it returned.
*/
if (current->bio_list) {
if (current->bio_list)
bio_list_add(&current->bio_list[0], bio);
return BLK_QC_T_NONE;
}
if (!bio->bi_bdev->bd_disk->fops->submit_bio)
return __submit_bio_noacct_mq(bio);
return __submit_bio_noacct(bio);
else if (!bio->bi_bdev->bd_disk->fops->submit_bio)
__submit_bio_noacct_mq(bio);
else
__submit_bio_noacct(bio);
}
EXPORT_SYMBOL(submit_bio_noacct);
@ -1054,10 +1021,10 @@ EXPORT_SYMBOL(submit_bio_noacct);
* in @bio. The bio must NOT be touched by thecaller until ->bi_end_io() has
* been called.
*/
blk_qc_t submit_bio(struct bio *bio)
void submit_bio(struct bio *bio)
{
if (blkcg_punt_bio_submit(bio))
return BLK_QC_T_NONE;
return;
/*
* If it's a regular read/write or a barrier with data attached,
@ -1068,7 +1035,7 @@ blk_qc_t submit_bio(struct bio *bio)
if (unlikely(bio_op(bio) == REQ_OP_WRITE_SAME))
count = queue_logical_block_size(
bio->bi_bdev->bd_disk->queue) >> 9;
bdev_get_queue(bio->bi_bdev)) >> 9;
else
count = bio_sectors(bio);
@ -1089,19 +1056,92 @@ blk_qc_t submit_bio(struct bio *bio)
if (unlikely(bio_op(bio) == REQ_OP_READ &&
bio_flagged(bio, BIO_WORKINGSET))) {
unsigned long pflags;
blk_qc_t ret;
psi_memstall_enter(&pflags);
ret = submit_bio_noacct(bio);
submit_bio_noacct(bio);
psi_memstall_leave(&pflags);
return ret;
return;
}
return submit_bio_noacct(bio);
submit_bio_noacct(bio);
}
EXPORT_SYMBOL(submit_bio);
/**
* bio_poll - poll for BIO completions
* @bio: bio to poll for
* @flags: BLK_POLL_* flags that control the behavior
*
* Poll for completions on queue associated with the bio. Returns number of
* completed entries found.
*
* Note: the caller must either be the context that submitted @bio, or
* be in a RCU critical section to prevent freeing of @bio.
*/
int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
{
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
int ret;
if (cookie == BLK_QC_T_NONE ||
!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
return 0;
if (current->plug)
blk_flush_plug(current->plug, false);
if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT))
return 0;
if (WARN_ON_ONCE(!queue_is_mq(q)))
ret = 0; /* not yet implemented, should not happen */
else
ret = blk_mq_poll(q, cookie, iob, flags);
blk_queue_exit(q);
return ret;
}
EXPORT_SYMBOL_GPL(bio_poll);
/*
* Helper to implement file_operations.iopoll. Requires the bio to be stored
* in iocb->private, and cleared before freeing the bio.
*/
int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
unsigned int flags)
{
struct bio *bio;
int ret = 0;
/*
* Note: the bio cache only uses SLAB_TYPESAFE_BY_RCU, so bio can
* point to a freshly allocated bio at this point. If that happens
* we have a few cases to consider:
*
* 1) the bio is beeing initialized and bi_bdev is NULL. We can just
* simply nothing in this case
* 2) the bio points to a not poll enabled device. bio_poll will catch
* this and return 0
* 3) the bio points to a poll capable device, including but not
* limited to the one that the original bio pointed to. In this
* case we will call into the actual poll method and poll for I/O,
* even if we don't need to, but it won't cause harm either.
*
* For cases 2) and 3) above the RCU grace period ensures that bi_bdev
* is still allocated. Because partitions hold a reference to the whole
* device bdev and thus disk, the disk is also still valid. Grabbing
* a reference to the queue in bio_poll() ensures the hctxs and requests
* are still valid as well.
*/
rcu_read_lock();
bio = READ_ONCE(kiocb->private);
if (bio && bio->bi_bdev)
ret = bio_poll(bio, iob, flags);
rcu_read_unlock();
return ret;
}
EXPORT_SYMBOL_GPL(iocb_bio_iopoll);
/**
* blk_cloned_rq_check_limits - Helper function to check a cloned request
* for the new queue limits
@ -1177,8 +1217,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
if (blk_crypto_insert_cloned_request(rq))
return BLK_STS_IOERR;
if (blk_queue_io_stat(q))
blk_account_io_start(rq);
blk_account_io_start(rq);
/*
* Since we have a scheduler attached on the top device,
@ -1246,41 +1285,19 @@ again:
}
}
static void blk_account_io_completion(struct request *req, unsigned int bytes)
void __blk_account_io_done(struct request *req, u64 now)
{
if (req->part && blk_do_io_stat(req)) {
const int sgrp = op_stat_group(req_op(req));
const int sgrp = op_stat_group(req_op(req));
part_stat_lock();
part_stat_add(req->part, sectors[sgrp], bytes >> 9);
part_stat_unlock();
}
part_stat_lock();
update_io_ticks(req->part, jiffies, true);
part_stat_inc(req->part, ios[sgrp]);
part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
part_stat_unlock();
}
void blk_account_io_done(struct request *req, u64 now)
void __blk_account_io_start(struct request *rq)
{
/*
* Account IO completion. flush_rq isn't accounted as a
* normal IO on queueing nor completion. Accounting the
* containing request is enough.
*/
if (req->part && blk_do_io_stat(req) &&
!(req->rq_flags & RQF_FLUSH_SEQ)) {
const int sgrp = op_stat_group(req_op(req));
part_stat_lock();
update_io_ticks(req->part, jiffies, true);
part_stat_inc(req->part, ios[sgrp]);
part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
part_stat_unlock();
}
}
void blk_account_io_start(struct request *rq)
{
if (!blk_do_io_stat(rq))
return;
/* passthrough requests can hold bios that do not have ->bi_bdev set */
if (rq->bio && rq->bio->bi_bdev)
rq->part = rq->bio->bi_bdev;
@ -1376,112 +1393,6 @@ void blk_steal_bios(struct bio_list *list, struct request *rq)
}
EXPORT_SYMBOL_GPL(blk_steal_bios);
/**
* blk_update_request - Complete multiple bytes without completing the request
* @req: the request being processed
* @error: block status code
* @nr_bytes: number of bytes to complete for @req
*
* Description:
* Ends I/O on a number of bytes attached to @req, but doesn't complete
* the request structure even if @req doesn't have leftover.
* If @req has leftover, sets it up for the next range of segments.
*
* Passing the result of blk_rq_bytes() as @nr_bytes guarantees
* %false return from this function.
*
* Note:
* The RQF_SPECIAL_PAYLOAD flag is ignored on purpose in this function
* except in the consistency check at the end of this function.
*
* Return:
* %false - this request doesn't have any more data
* %true - this request has more data
**/
bool blk_update_request(struct request *req, blk_status_t error,
unsigned int nr_bytes)
{
int total_bytes;
trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
if (!req->bio)
return false;
#ifdef CONFIG_BLK_DEV_INTEGRITY
if (blk_integrity_rq(req) && req_op(req) == REQ_OP_READ &&
error == BLK_STS_OK)
req->q->integrity.profile->complete_fn(req, nr_bytes);
#endif
if (unlikely(error && !blk_rq_is_passthrough(req) &&
!(req->rq_flags & RQF_QUIET)))
print_req_error(req, error, __func__);
blk_account_io_completion(req, nr_bytes);
total_bytes = 0;
while (req->bio) {
struct bio *bio = req->bio;
unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
if (bio_bytes == bio->bi_iter.bi_size)
req->bio = bio->bi_next;
/* Completion has already been traced */
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
req_bio_endio(req, bio, bio_bytes, error);
total_bytes += bio_bytes;
nr_bytes -= bio_bytes;
if (!nr_bytes)
break;
}
/*
* completely done
*/
if (!req->bio) {
/*
* Reset counters so that the request stacking driver
* can find how many bytes remain in the request
* later.
*/
req->__data_len = 0;
return false;
}
req->__data_len -= total_bytes;
/* update sector only for requests with clear definition of sector */
if (!blk_rq_is_passthrough(req))
req->__sector += total_bytes >> 9;
/* mixed attributes always follow the first bio */
if (req->rq_flags & RQF_MIXED_MERGE) {
req->cmd_flags &= ~REQ_FAILFAST_MASK;
req->cmd_flags |= req->bio->bi_opf & REQ_FAILFAST_MASK;
}
if (!(req->rq_flags & RQF_SPECIAL_PAYLOAD)) {
/*
* If total number of sectors is less than the first segment
* size, something has gone terribly wrong.
*/
if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) {
blk_dump_rq_flags(req, "request botched");
req->__data_len = blk_rq_cur_bytes(req);
}
/* recalculate the number of segments */
req->nr_phys_segments = blk_recalc_rq_segments(req);
}
return true;
}
EXPORT_SYMBOL_GPL(blk_update_request);
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
/**
* rq_flush_dcache_pages - Helper function to flush all pages in a request
@ -1629,6 +1540,32 @@ int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
}
EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
{
struct task_struct *tsk = current;
/*
* If this is a nested plug, don't actually assign it.
*/
if (tsk->plug)
return;
plug->mq_list = NULL;
plug->cached_rq = NULL;
plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
plug->rq_count = 0;
plug->multiple_queues = false;
plug->has_elevator = false;
plug->nowait = false;
INIT_LIST_HEAD(&plug->cb_list);
/*
* Store ordering should not be needed here, since a potential
* preempt will imply a full memory barrier
*/
tsk->plug = plug;
}
/**
* blk_start_plug - initialize blk_plug and track it inside the task_struct
* @plug: The &struct blk_plug that needs to be initialized
@ -1654,25 +1591,7 @@ EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
*/
void blk_start_plug(struct blk_plug *plug)
{
struct task_struct *tsk = current;
/*
* If this is a nested plug, don't actually assign it.
*/
if (tsk->plug)
return;
INIT_LIST_HEAD(&plug->mq_list);
INIT_LIST_HEAD(&plug->cb_list);
plug->rq_count = 0;
plug->multiple_queues = false;
plug->nowait = false;
/*
* Store ordering should not be needed here, since a potential
* preempt will imply a full memory barrier
*/
tsk->plug = plug;
blk_start_plug_nr_ios(plug, 1);
}
EXPORT_SYMBOL(blk_start_plug);
@ -1718,12 +1637,14 @@ struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug, void *data,
}
EXPORT_SYMBOL(blk_check_plugged);
void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
void blk_flush_plug(struct blk_plug *plug, bool from_schedule)
{
flush_plug_callbacks(plug, from_schedule);
if (!list_empty(&plug->mq_list))
if (!list_empty(&plug->cb_list))
flush_plug_callbacks(plug, from_schedule);
if (!rq_list_empty(plug->mq_list))
blk_mq_flush_plug_list(plug, from_schedule);
if (unlikely(!from_schedule && plug->cached_rq))
blk_mq_free_plug_rqs(plug);
}
/**
@ -1738,11 +1659,10 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
*/
void blk_finish_plug(struct blk_plug *plug)
{
if (plug != current->plug)
return;
blk_flush_plug_list(plug, false);
current->plug = NULL;
if (plug == current->plug) {
blk_flush_plug(plug, false);
current->plug = NULL;
}
}
EXPORT_SYMBOL(blk_finish_plug);

View File

@ -12,12 +12,13 @@
#include <crypto/skcipher.h>
#include <linux/blk-cgroup.h>
#include <linux/blk-crypto.h>
#include <linux/blk-crypto-profile.h>
#include <linux/blkdev.h>
#include <linux/crypto.h>
#include <linux/keyslot-manager.h>
#include <linux/mempool.h>
#include <linux/module.h>
#include <linux/random.h>
#include <linux/scatterlist.h>
#include "blk-crypto-internal.h"
@ -72,12 +73,12 @@ static mempool_t *bio_fallback_crypt_ctx_pool;
static DEFINE_MUTEX(tfms_init_lock);
static bool tfms_inited[BLK_ENCRYPTION_MODE_MAX];
static struct blk_crypto_keyslot {
static struct blk_crypto_fallback_keyslot {
enum blk_crypto_mode_num crypto_mode;
struct crypto_skcipher *tfms[BLK_ENCRYPTION_MODE_MAX];
} *blk_crypto_keyslots;
static struct blk_keyslot_manager blk_crypto_ksm;
static struct blk_crypto_profile blk_crypto_fallback_profile;
static struct workqueue_struct *blk_crypto_wq;
static mempool_t *blk_crypto_bounce_page_pool;
static struct bio_set crypto_bio_split;
@ -88,9 +89,9 @@ static struct bio_set crypto_bio_split;
*/
static u8 blank_key[BLK_CRYPTO_MAX_KEY_SIZE];
static void blk_crypto_evict_keyslot(unsigned int slot)
static void blk_crypto_fallback_evict_keyslot(unsigned int slot)
{
struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
struct blk_crypto_fallback_keyslot *slotp = &blk_crypto_keyslots[slot];
enum blk_crypto_mode_num crypto_mode = slotp->crypto_mode;
int err;
@ -103,45 +104,41 @@ static void blk_crypto_evict_keyslot(unsigned int slot)
slotp->crypto_mode = BLK_ENCRYPTION_MODE_INVALID;
}
static int blk_crypto_keyslot_program(struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key,
unsigned int slot)
static int
blk_crypto_fallback_keyslot_program(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key,
unsigned int slot)
{
struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
struct blk_crypto_fallback_keyslot *slotp = &blk_crypto_keyslots[slot];
const enum blk_crypto_mode_num crypto_mode =
key->crypto_cfg.crypto_mode;
int err;
if (crypto_mode != slotp->crypto_mode &&
slotp->crypto_mode != BLK_ENCRYPTION_MODE_INVALID)
blk_crypto_evict_keyslot(slot);
blk_crypto_fallback_evict_keyslot(slot);
slotp->crypto_mode = crypto_mode;
err = crypto_skcipher_setkey(slotp->tfms[crypto_mode], key->raw,
key->size);
if (err) {
blk_crypto_evict_keyslot(slot);
blk_crypto_fallback_evict_keyslot(slot);
return err;
}
return 0;
}
static int blk_crypto_keyslot_evict(struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key,
unsigned int slot)
static int blk_crypto_fallback_keyslot_evict(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key,
unsigned int slot)
{
blk_crypto_evict_keyslot(slot);
blk_crypto_fallback_evict_keyslot(slot);
return 0;
}
/*
* The crypto API fallback KSM ops - only used for a bio when it specifies a
* blk_crypto_key that was not supported by the device's inline encryption
* hardware.
*/
static const struct blk_ksm_ll_ops blk_crypto_ksm_ll_ops = {
.keyslot_program = blk_crypto_keyslot_program,
.keyslot_evict = blk_crypto_keyslot_evict,
static const struct blk_crypto_ll_ops blk_crypto_fallback_ll_ops = {
.keyslot_program = blk_crypto_fallback_keyslot_program,
.keyslot_evict = blk_crypto_fallback_keyslot_evict,
};
static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
@ -159,7 +156,7 @@ static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
bio_endio(src_bio);
}
static struct bio *blk_crypto_clone_bio(struct bio *bio_src)
static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
{
struct bvec_iter iter;
struct bio_vec bv;
@ -186,13 +183,14 @@ static struct bio *blk_crypto_clone_bio(struct bio *bio_src)
return bio;
}
static bool blk_crypto_alloc_cipher_req(struct blk_ksm_keyslot *slot,
struct skcipher_request **ciph_req_ret,
struct crypto_wait *wait)
static bool
blk_crypto_fallback_alloc_cipher_req(struct blk_crypto_keyslot *slot,
struct skcipher_request **ciph_req_ret,
struct crypto_wait *wait)
{
struct skcipher_request *ciph_req;
const struct blk_crypto_keyslot *slotp;
int keyslot_idx = blk_ksm_get_slot_idx(slot);
const struct blk_crypto_fallback_keyslot *slotp;
int keyslot_idx = blk_crypto_keyslot_index(slot);
slotp = &blk_crypto_keyslots[keyslot_idx];
ciph_req = skcipher_request_alloc(slotp->tfms[slotp->crypto_mode],
@ -209,7 +207,7 @@ static bool blk_crypto_alloc_cipher_req(struct blk_ksm_keyslot *slot,
return true;
}
static bool blk_crypto_split_bio_if_needed(struct bio **bio_ptr)
static bool blk_crypto_fallback_split_bio_if_needed(struct bio **bio_ptr)
{
struct bio *bio = *bio_ptr;
unsigned int i = 0;
@ -264,7 +262,7 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
{
struct bio *src_bio, *enc_bio;
struct bio_crypt_ctx *bc;
struct blk_ksm_keyslot *slot;
struct blk_crypto_keyslot *slot;
int data_unit_size;
struct skcipher_request *ciph_req = NULL;
DECLARE_CRYPTO_WAIT(wait);
@ -276,7 +274,7 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
blk_status_t blk_st;
/* Split the bio if it's too big for single page bvec */
if (!blk_crypto_split_bio_if_needed(bio_ptr))
if (!blk_crypto_fallback_split_bio_if_needed(bio_ptr))
return false;
src_bio = *bio_ptr;
@ -284,24 +282,25 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
/* Allocate bounce bio for encryption */
enc_bio = blk_crypto_clone_bio(src_bio);
enc_bio = blk_crypto_fallback_clone_bio(src_bio);
if (!enc_bio) {
src_bio->bi_status = BLK_STS_RESOURCE;
return false;
}
/*
* Use the crypto API fallback keyslot manager to get a crypto_skcipher
* for the algorithm and key specified for this bio.
* Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
* this bio's algorithm and key.
*/
blk_st = blk_ksm_get_slot_for_key(&blk_crypto_ksm, bc->bc_key, &slot);
blk_st = blk_crypto_get_keyslot(&blk_crypto_fallback_profile,
bc->bc_key, &slot);
if (blk_st != BLK_STS_OK) {
src_bio->bi_status = blk_st;
goto out_put_enc_bio;
}
/* and then allocate an skcipher_request for it */
if (!blk_crypto_alloc_cipher_req(slot, &ciph_req, &wait)) {
if (!blk_crypto_fallback_alloc_cipher_req(slot, &ciph_req, &wait)) {
src_bio->bi_status = BLK_STS_RESOURCE;
goto out_release_keyslot;
}
@ -362,7 +361,7 @@ out_free_bounce_pages:
out_free_ciph_req:
skcipher_request_free(ciph_req);
out_release_keyslot:
blk_ksm_put_slot(slot);
blk_crypto_put_keyslot(slot);
out_put_enc_bio:
if (enc_bio)
bio_put(enc_bio);
@ -380,7 +379,7 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
container_of(work, struct bio_fallback_crypt_ctx, work);
struct bio *bio = f_ctx->bio;
struct bio_crypt_ctx *bc = &f_ctx->crypt_ctx;
struct blk_ksm_keyslot *slot;
struct blk_crypto_keyslot *slot;
struct skcipher_request *ciph_req = NULL;
DECLARE_CRYPTO_WAIT(wait);
u64 curr_dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
@ -393,17 +392,18 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
blk_status_t blk_st;
/*
* Use the crypto API fallback keyslot manager to get a crypto_skcipher
* for the algorithm and key specified for this bio.
* Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
* this bio's algorithm and key.
*/
blk_st = blk_ksm_get_slot_for_key(&blk_crypto_ksm, bc->bc_key, &slot);
blk_st = blk_crypto_get_keyslot(&blk_crypto_fallback_profile,
bc->bc_key, &slot);
if (blk_st != BLK_STS_OK) {
bio->bi_status = blk_st;
goto out_no_keyslot;
}
/* and then allocate an skcipher_request for it */
if (!blk_crypto_alloc_cipher_req(slot, &ciph_req, &wait)) {
if (!blk_crypto_fallback_alloc_cipher_req(slot, &ciph_req, &wait)) {
bio->bi_status = BLK_STS_RESOURCE;
goto out;
}
@ -434,7 +434,7 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
out:
skcipher_request_free(ciph_req);
blk_ksm_put_slot(slot);
blk_crypto_put_keyslot(slot);
out_no_keyslot:
mempool_free(f_ctx, bio_fallback_crypt_ctx_pool);
bio_endio(bio);
@ -473,9 +473,9 @@ static void blk_crypto_fallback_decrypt_endio(struct bio *bio)
* @bio_ptr: pointer to the bio to prepare
*
* If bio is doing a WRITE operation, this splits the bio into two parts if it's
* too big (see blk_crypto_split_bio_if_needed). It then allocates a bounce bio
* for the first part, encrypts it, and update bio_ptr to point to the bounce
* bio.
* too big (see blk_crypto_fallback_split_bio_if_needed()). It then allocates a
* bounce bio for the first part, encrypts it, and updates bio_ptr to point to
* the bounce bio.
*
* For a READ operation, we mark the bio for decryption by using bi_private and
* bi_end_io.
@ -499,8 +499,8 @@ bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
return false;
}
if (!blk_ksm_crypto_cfg_supported(&blk_crypto_ksm,
&bc->bc_key->crypto_cfg)) {
if (!__blk_crypto_cfg_supported(&blk_crypto_fallback_profile,
&bc->bc_key->crypto_cfg)) {
bio->bi_status = BLK_STS_NOTSUPP;
return false;
}
@ -526,7 +526,7 @@ bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
int blk_crypto_fallback_evict_key(const struct blk_crypto_key *key)
{
return blk_ksm_evict_key(&blk_crypto_ksm, key);
return __blk_crypto_evict_key(&blk_crypto_fallback_profile, key);
}
static bool blk_crypto_fallback_inited;
@ -534,6 +534,7 @@ static int blk_crypto_fallback_init(void)
{
int i;
int err;
struct blk_crypto_profile *profile = &blk_crypto_fallback_profile;
if (blk_crypto_fallback_inited)
return 0;
@ -544,24 +545,24 @@ static int blk_crypto_fallback_init(void)
if (err)
goto out;
err = blk_ksm_init(&blk_crypto_ksm, blk_crypto_num_keyslots);
err = blk_crypto_profile_init(profile, blk_crypto_num_keyslots);
if (err)
goto fail_free_bioset;
err = -ENOMEM;
blk_crypto_ksm.ksm_ll_ops = blk_crypto_ksm_ll_ops;
blk_crypto_ksm.max_dun_bytes_supported = BLK_CRYPTO_MAX_IV_SIZE;
profile->ll_ops = blk_crypto_fallback_ll_ops;
profile->max_dun_bytes_supported = BLK_CRYPTO_MAX_IV_SIZE;
/* All blk-crypto modes have a crypto API fallback. */
for (i = 0; i < BLK_ENCRYPTION_MODE_MAX; i++)
blk_crypto_ksm.crypto_modes_supported[i] = 0xFFFFFFFF;
blk_crypto_ksm.crypto_modes_supported[BLK_ENCRYPTION_MODE_INVALID] = 0;
profile->modes_supported[i] = 0xFFFFFFFF;
profile->modes_supported[BLK_ENCRYPTION_MODE_INVALID] = 0;
blk_crypto_wq = alloc_workqueue("blk_crypto_wq",
WQ_UNBOUND | WQ_HIGHPRI |
WQ_MEM_RECLAIM, num_online_cpus());
if (!blk_crypto_wq)
goto fail_free_ksm;
goto fail_destroy_profile;
blk_crypto_keyslots = kcalloc(blk_crypto_num_keyslots,
sizeof(blk_crypto_keyslots[0]),
@ -595,8 +596,8 @@ fail_free_keyslots:
kfree(blk_crypto_keyslots);
fail_free_wq:
destroy_workqueue(blk_crypto_wq);
fail_free_ksm:
blk_ksm_destroy(&blk_crypto_ksm);
fail_destroy_profile:
blk_crypto_profile_destroy(profile);
fail_free_bioset:
bioset_exit(&crypto_bio_split);
out:
@ -610,7 +611,7 @@ out:
int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
{
const char *cipher_str = blk_crypto_modes[mode_num].cipher_str;
struct blk_crypto_keyslot *slotp;
struct blk_crypto_fallback_keyslot *slotp;
unsigned int i;
int err = 0;

View File

@ -7,7 +7,7 @@
#define __LINUX_BLK_CRYPTO_INTERNAL_H
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
/* Represents a crypto mode supported by blk-crypto */
struct blk_crypto_mode {

565
block/blk-crypto-profile.c Normal file
View File

@ -0,0 +1,565 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright 2019 Google LLC
*/
/**
* DOC: blk-crypto profiles
*
* 'struct blk_crypto_profile' contains all generic inline encryption-related
* state for a particular inline encryption device. blk_crypto_profile serves
* as the way that drivers for inline encryption hardware expose their crypto
* capabilities and certain functions (e.g., functions to program and evict
* keys) to upper layers. Device drivers that want to support inline encryption
* construct a crypto profile, then associate it with the disk's request_queue.
*
* If the device has keyslots, then its blk_crypto_profile also handles managing
* these keyslots in a device-independent way, using the driver-provided
* functions to program and evict keys as needed. This includes keeping track
* of which key and how many I/O requests are using each keyslot, getting
* keyslots for I/O requests, and handling key eviction requests.
*
* For more information, see Documentation/block/inline-encryption.rst.
*/
#define pr_fmt(fmt) "blk-crypto: " fmt
#include <linux/blk-crypto-profile.h>
#include <linux/device.h>
#include <linux/atomic.h>
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/wait.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
struct blk_crypto_keyslot {
atomic_t slot_refs;
struct list_head idle_slot_node;
struct hlist_node hash_node;
const struct blk_crypto_key *key;
struct blk_crypto_profile *profile;
};
static inline void blk_crypto_hw_enter(struct blk_crypto_profile *profile)
{
/*
* Calling into the driver requires profile->lock held and the device
* resumed. But we must resume the device first, since that can acquire
* and release profile->lock via blk_crypto_reprogram_all_keys().
*/
if (profile->dev)
pm_runtime_get_sync(profile->dev);
down_write(&profile->lock);
}
static inline void blk_crypto_hw_exit(struct blk_crypto_profile *profile)
{
up_write(&profile->lock);
if (profile->dev)
pm_runtime_put_sync(profile->dev);
}
/**
* blk_crypto_profile_init() - Initialize a blk_crypto_profile
* @profile: the blk_crypto_profile to initialize
* @num_slots: the number of keyslots
*
* Storage drivers must call this when starting to set up a blk_crypto_profile,
* before filling in additional fields.
*
* Return: 0 on success, or else a negative error code.
*/
int blk_crypto_profile_init(struct blk_crypto_profile *profile,
unsigned int num_slots)
{
unsigned int slot;
unsigned int i;
unsigned int slot_hashtable_size;
memset(profile, 0, sizeof(*profile));
init_rwsem(&profile->lock);
if (num_slots == 0)
return 0;
/* Initialize keyslot management data. */
profile->slots = kvcalloc(num_slots, sizeof(profile->slots[0]),
GFP_KERNEL);
if (!profile->slots)
return -ENOMEM;
profile->num_slots = num_slots;
init_waitqueue_head(&profile->idle_slots_wait_queue);
INIT_LIST_HEAD(&profile->idle_slots);
for (slot = 0; slot < num_slots; slot++) {
profile->slots[slot].profile = profile;
list_add_tail(&profile->slots[slot].idle_slot_node,
&profile->idle_slots);
}
spin_lock_init(&profile->idle_slots_lock);
slot_hashtable_size = roundup_pow_of_two(num_slots);
/*
* hash_ptr() assumes bits != 0, so ensure the hash table has at least 2
* buckets. This only makes a difference when there is only 1 keyslot.
*/
if (slot_hashtable_size < 2)
slot_hashtable_size = 2;
profile->log_slot_ht_size = ilog2(slot_hashtable_size);
profile->slot_hashtable =
kvmalloc_array(slot_hashtable_size,
sizeof(profile->slot_hashtable[0]), GFP_KERNEL);
if (!profile->slot_hashtable)
goto err_destroy;
for (i = 0; i < slot_hashtable_size; i++)
INIT_HLIST_HEAD(&profile->slot_hashtable[i]);
return 0;
err_destroy:
blk_crypto_profile_destroy(profile);
return -ENOMEM;
}
EXPORT_SYMBOL_GPL(blk_crypto_profile_init);
static void blk_crypto_profile_destroy_callback(void *profile)
{
blk_crypto_profile_destroy(profile);
}
/**
* devm_blk_crypto_profile_init() - Resource-managed blk_crypto_profile_init()
* @dev: the device which owns the blk_crypto_profile
* @profile: the blk_crypto_profile to initialize
* @num_slots: the number of keyslots
*
* Like blk_crypto_profile_init(), but causes blk_crypto_profile_destroy() to be
* called automatically on driver detach.
*
* Return: 0 on success, or else a negative error code.
*/
int devm_blk_crypto_profile_init(struct device *dev,
struct blk_crypto_profile *profile,
unsigned int num_slots)
{
int err = blk_crypto_profile_init(profile, num_slots);
if (err)
return err;
return devm_add_action_or_reset(dev,
blk_crypto_profile_destroy_callback,
profile);
}
EXPORT_SYMBOL_GPL(devm_blk_crypto_profile_init);
static inline struct hlist_head *
blk_crypto_hash_bucket_for_key(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key)
{
return &profile->slot_hashtable[
hash_ptr(key, profile->log_slot_ht_size)];
}
static void
blk_crypto_remove_slot_from_lru_list(struct blk_crypto_keyslot *slot)
{
struct blk_crypto_profile *profile = slot->profile;
unsigned long flags;
spin_lock_irqsave(&profile->idle_slots_lock, flags);
list_del(&slot->idle_slot_node);
spin_unlock_irqrestore(&profile->idle_slots_lock, flags);
}
static struct blk_crypto_keyslot *
blk_crypto_find_keyslot(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key)
{
const struct hlist_head *head =
blk_crypto_hash_bucket_for_key(profile, key);
struct blk_crypto_keyslot *slotp;
hlist_for_each_entry(slotp, head, hash_node) {
if (slotp->key == key)
return slotp;
}
return NULL;
}
static struct blk_crypto_keyslot *
blk_crypto_find_and_grab_keyslot(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key)
{
struct blk_crypto_keyslot *slot;
slot = blk_crypto_find_keyslot(profile, key);
if (!slot)
return NULL;
if (atomic_inc_return(&slot->slot_refs) == 1) {
/* Took first reference to this slot; remove it from LRU list */
blk_crypto_remove_slot_from_lru_list(slot);
}
return slot;
}
/**
* blk_crypto_keyslot_index() - Get the index of a keyslot
* @slot: a keyslot that blk_crypto_get_keyslot() returned
*
* Return: the 0-based index of the keyslot within the device's keyslots.
*/
unsigned int blk_crypto_keyslot_index(struct blk_crypto_keyslot *slot)
{
return slot - slot->profile->slots;
}
EXPORT_SYMBOL_GPL(blk_crypto_keyslot_index);
/**
* blk_crypto_get_keyslot() - Get a keyslot for a key, if needed.
* @profile: the crypto profile of the device the key will be used on
* @key: the key that will be used
* @slot_ptr: If a keyslot is allocated, an opaque pointer to the keyslot struct
* will be stored here; otherwise NULL will be stored here.
*
* If the device has keyslots, this gets a keyslot that's been programmed with
* the specified key. If the key is already in a slot, this reuses it;
* otherwise this waits for a slot to become idle and programs the key into it.
*
* This must be paired with a call to blk_crypto_put_keyslot().
*
* Context: Process context. Takes and releases profile->lock.
* Return: BLK_STS_OK on success, meaning that either a keyslot was allocated or
* one wasn't needed; or a blk_status_t error on failure.
*/
blk_status_t blk_crypto_get_keyslot(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key,
struct blk_crypto_keyslot **slot_ptr)
{
struct blk_crypto_keyslot *slot;
int slot_idx;
int err;
*slot_ptr = NULL;
/*
* If the device has no concept of "keyslots", then there is no need to
* get one.
*/
if (profile->num_slots == 0)
return BLK_STS_OK;
down_read(&profile->lock);
slot = blk_crypto_find_and_grab_keyslot(profile, key);
up_read(&profile->lock);
if (slot)
goto success;
for (;;) {
blk_crypto_hw_enter(profile);
slot = blk_crypto_find_and_grab_keyslot(profile, key);
if (slot) {
blk_crypto_hw_exit(profile);
goto success;
}
/*
* If we're here, that means there wasn't a slot that was
* already programmed with the key. So try to program it.
*/
if (!list_empty(&profile->idle_slots))
break;
blk_crypto_hw_exit(profile);
wait_event(profile->idle_slots_wait_queue,
!list_empty(&profile->idle_slots));
}
slot = list_first_entry(&profile->idle_slots, struct blk_crypto_keyslot,
idle_slot_node);
slot_idx = blk_crypto_keyslot_index(slot);
err = profile->ll_ops.keyslot_program(profile, key, slot_idx);
if (err) {
wake_up(&profile->idle_slots_wait_queue);
blk_crypto_hw_exit(profile);
return errno_to_blk_status(err);
}
/* Move this slot to the hash list for the new key. */
if (slot->key)
hlist_del(&slot->hash_node);
slot->key = key;
hlist_add_head(&slot->hash_node,
blk_crypto_hash_bucket_for_key(profile, key));
atomic_set(&slot->slot_refs, 1);
blk_crypto_remove_slot_from_lru_list(slot);
blk_crypto_hw_exit(profile);
success:
*slot_ptr = slot;
return BLK_STS_OK;
}
/**
* blk_crypto_put_keyslot() - Release a reference to a keyslot
* @slot: The keyslot to release the reference of (may be NULL).
*
* Context: Any context.
*/
void blk_crypto_put_keyslot(struct blk_crypto_keyslot *slot)
{
struct blk_crypto_profile *profile;
unsigned long flags;
if (!slot)
return;
profile = slot->profile;
if (atomic_dec_and_lock_irqsave(&slot->slot_refs,
&profile->idle_slots_lock, flags)) {
list_add_tail(&slot->idle_slot_node, &profile->idle_slots);
spin_unlock_irqrestore(&profile->idle_slots_lock, flags);
wake_up(&profile->idle_slots_wait_queue);
}
}
/**
* __blk_crypto_cfg_supported() - Check whether the given crypto profile
* supports the given crypto configuration.
* @profile: the crypto profile to check
* @cfg: the crypto configuration to check for
*
* Return: %true if @profile supports the given @cfg.
*/
bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
const struct blk_crypto_config *cfg)
{
if (!profile)
return false;
if (!(profile->modes_supported[cfg->crypto_mode] & cfg->data_unit_size))
return false;
if (profile->max_dun_bytes_supported < cfg->dun_bytes)
return false;
return true;
}
/**
* __blk_crypto_evict_key() - Evict a key from a device.
* @profile: the crypto profile of the device
* @key: the key to evict. It must not still be used in any I/O.
*
* If the device has keyslots, this finds the keyslot (if any) that contains the
* specified key and calls the driver's keyslot_evict function to evict it.
*
* Otherwise, this just calls the driver's keyslot_evict function if it is
* implemented, passing just the key (without any particular keyslot). This
* allows layered devices to evict the key from their underlying devices.
*
* Context: Process context. Takes and releases profile->lock.
* Return: 0 on success or if there's no keyslot with the specified key, -EBUSY
* if the keyslot is still in use, or another -errno value on other
* error.
*/
int __blk_crypto_evict_key(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key)
{
struct blk_crypto_keyslot *slot;
int err = 0;
if (profile->num_slots == 0) {
if (profile->ll_ops.keyslot_evict) {
blk_crypto_hw_enter(profile);
err = profile->ll_ops.keyslot_evict(profile, key, -1);
blk_crypto_hw_exit(profile);
return err;
}
return 0;
}
blk_crypto_hw_enter(profile);
slot = blk_crypto_find_keyslot(profile, key);
if (!slot)
goto out_unlock;
if (WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)) {
err = -EBUSY;
goto out_unlock;
}
err = profile->ll_ops.keyslot_evict(profile, key,
blk_crypto_keyslot_index(slot));
if (err)
goto out_unlock;
hlist_del(&slot->hash_node);
slot->key = NULL;
err = 0;
out_unlock:
blk_crypto_hw_exit(profile);
return err;
}
/**
* blk_crypto_reprogram_all_keys() - Re-program all keyslots.
* @profile: The crypto profile
*
* Re-program all keyslots that are supposed to have a key programmed. This is
* intended only for use by drivers for hardware that loses its keys on reset.
*
* Context: Process context. Takes and releases profile->lock.
*/
void blk_crypto_reprogram_all_keys(struct blk_crypto_profile *profile)
{
unsigned int slot;
if (profile->num_slots == 0)
return;
/* This is for device initialization, so don't resume the device */
down_write(&profile->lock);
for (slot = 0; slot < profile->num_slots; slot++) {
const struct blk_crypto_key *key = profile->slots[slot].key;
int err;
if (!key)
continue;
err = profile->ll_ops.keyslot_program(profile, key, slot);
WARN_ON(err);
}
up_write(&profile->lock);
}
EXPORT_SYMBOL_GPL(blk_crypto_reprogram_all_keys);
void blk_crypto_profile_destroy(struct blk_crypto_profile *profile)
{
if (!profile)
return;
kvfree(profile->slot_hashtable);
kvfree_sensitive(profile->slots,
sizeof(profile->slots[0]) * profile->num_slots);
memzero_explicit(profile, sizeof(*profile));
}
EXPORT_SYMBOL_GPL(blk_crypto_profile_destroy);
bool blk_crypto_register(struct blk_crypto_profile *profile,
struct request_queue *q)
{
if (blk_integrity_queue_supports_integrity(q)) {
pr_warn("Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
return false;
}
q->crypto_profile = profile;
return true;
}
EXPORT_SYMBOL_GPL(blk_crypto_register);
void blk_crypto_unregister(struct request_queue *q)
{
q->crypto_profile = NULL;
}
/**
* blk_crypto_intersect_capabilities() - restrict supported crypto capabilities
* by child device
* @parent: the crypto profile for the parent device
* @child: the crypto profile for the child device, or NULL
*
* This clears all crypto capabilities in @parent that aren't set in @child. If
* @child is NULL, then this clears all parent capabilities.
*
* Only use this when setting up the crypto profile for a layered device, before
* it's been exposed yet.
*/
void blk_crypto_intersect_capabilities(struct blk_crypto_profile *parent,
const struct blk_crypto_profile *child)
{
if (child) {
unsigned int i;
parent->max_dun_bytes_supported =
min(parent->max_dun_bytes_supported,
child->max_dun_bytes_supported);
for (i = 0; i < ARRAY_SIZE(child->modes_supported); i++)
parent->modes_supported[i] &= child->modes_supported[i];
} else {
parent->max_dun_bytes_supported = 0;
memset(parent->modes_supported, 0,
sizeof(parent->modes_supported));
}
}
EXPORT_SYMBOL_GPL(blk_crypto_intersect_capabilities);
/**
* blk_crypto_has_capabilities() - Check whether @target supports at least all
* the crypto capabilities that @reference does.
* @target: the target profile
* @reference: the reference profile
*
* Return: %true if @target supports all the crypto capabilities of @reference.
*/
bool blk_crypto_has_capabilities(const struct blk_crypto_profile *target,
const struct blk_crypto_profile *reference)
{
int i;
if (!reference)
return true;
if (!target)
return false;
for (i = 0; i < ARRAY_SIZE(target->modes_supported); i++) {
if (reference->modes_supported[i] & ~target->modes_supported[i])
return false;
}
if (reference->max_dun_bytes_supported >
target->max_dun_bytes_supported)
return false;
return true;
}
EXPORT_SYMBOL_GPL(blk_crypto_has_capabilities);
/**
* blk_crypto_update_capabilities() - Update the capabilities of a crypto
* profile to match those of another crypto
* profile.
* @dst: The crypto profile whose capabilities to update.
* @src: The crypto profile whose capabilities this function will update @dst's
* capabilities to.
*
* Blk-crypto requires that crypto capabilities that were
* advertised when a bio was created continue to be supported by the
* device until that bio is ended. This is turn means that a device cannot
* shrink its advertised crypto capabilities without any explicit
* synchronization with upper layers. So if there's no such explicit
* synchronization, @src must support all the crypto capabilities that
* @dst does (i.e. we need blk_crypto_has_capabilities(@src, @dst)).
*
* Note also that as long as the crypto capabilities are being expanded, the
* order of updates becoming visible is not important because it's alright
* for blk-crypto to see stale values - they only cause blk-crypto to
* believe that a crypto capability isn't supported when it actually is (which
* might result in blk-crypto-fallback being used if available, or the bio being
* failed).
*/
void blk_crypto_update_capabilities(struct blk_crypto_profile *dst,
const struct blk_crypto_profile *src)
{
memcpy(dst->modes_supported, src->modes_supported,
sizeof(dst->modes_supported));
dst->max_dun_bytes_supported = src->max_dun_bytes_supported;
}
EXPORT_SYMBOL_GPL(blk_crypto_update_capabilities);

View File

@ -11,7 +11,7 @@
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/keyslot-manager.h>
#include <linux/blk-crypto-profile.h>
#include <linux/module.h>
#include <linux/slab.h>
@ -218,8 +218,9 @@ static bool bio_crypt_check_alignment(struct bio *bio)
blk_status_t __blk_crypto_init_request(struct request *rq)
{
return blk_ksm_get_slot_for_key(rq->q->ksm, rq->crypt_ctx->bc_key,
&rq->crypt_keyslot);
return blk_crypto_get_keyslot(rq->q->crypto_profile,
rq->crypt_ctx->bc_key,
&rq->crypt_keyslot);
}
/**
@ -233,7 +234,7 @@ blk_status_t __blk_crypto_init_request(struct request *rq)
*/
void __blk_crypto_free_request(struct request *rq)
{
blk_ksm_put_slot(rq->crypt_keyslot);
blk_crypto_put_keyslot(rq->crypt_keyslot);
mempool_free(rq->crypt_ctx, bio_crypt_ctx_pool);
blk_crypto_rq_set_defaults(rq);
}
@ -264,6 +265,7 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
{
struct bio *bio = *bio_ptr;
const struct blk_crypto_key *bc_key = bio->bi_crypt_context->bc_key;
struct blk_crypto_profile *profile;
/* Error if bio has no data. */
if (WARN_ON_ONCE(!bio_has_data(bio))) {
@ -280,8 +282,8 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
* Success if device supports the encryption context, or if we succeeded
* in falling back to the crypto API.
*/
if (blk_ksm_crypto_cfg_supported(bio->bi_bdev->bd_disk->queue->ksm,
&bc_key->crypto_cfg))
profile = bdev_get_queue(bio->bi_bdev)->crypto_profile;
if (__blk_crypto_cfg_supported(profile, &bc_key->crypto_cfg))
return true;
if (blk_crypto_fallback_bio_prep(bio_ptr))
@ -357,7 +359,7 @@ bool blk_crypto_config_supported(struct request_queue *q,
const struct blk_crypto_config *cfg)
{
return IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) ||
blk_ksm_crypto_cfg_supported(q->ksm, cfg);
__blk_crypto_cfg_supported(q->crypto_profile, cfg);
}
/**
@ -378,7 +380,7 @@ bool blk_crypto_config_supported(struct request_queue *q,
int blk_crypto_start_using_key(const struct blk_crypto_key *key,
struct request_queue *q)
{
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
if (__blk_crypto_cfg_supported(q->crypto_profile, &key->crypto_cfg))
return 0;
return blk_crypto_fallback_start_using_mode(key->crypto_cfg.crypto_mode);
}
@ -394,18 +396,17 @@ int blk_crypto_start_using_key(const struct blk_crypto_key *key,
* evicted from any hardware that it might have been programmed into. The key
* must not be in use by any in-flight IO when this function is called.
*
* Return: 0 on success or if key is not present in the q's ksm, -err on error.
* Return: 0 on success or if the key wasn't in any keyslot; -errno on error.
*/
int blk_crypto_evict_key(struct request_queue *q,
const struct blk_crypto_key *key)
{
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
return blk_ksm_evict_key(q->ksm, key);
if (__blk_crypto_cfg_supported(q->crypto_profile, &key->crypto_cfg))
return __blk_crypto_evict_key(q->crypto_profile, key);
/*
* If the request queue's associated inline encryption hardware didn't
* have support for the key, then the key might have been programmed
* into the fallback keyslot manager, so try to evict from there.
* If the request_queue didn't support the key, then blk-crypto-fallback
* may have been used, so try to evict the key from blk-crypto-fallback.
*/
return blk_crypto_fallback_evict_key(key);
}

View File

@ -65,13 +65,19 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
static bool blk_rq_is_poll(struct request *rq)
{
return rq->mq_hctx && rq->mq_hctx->type == HCTX_TYPE_POLL;
if (!rq->mq_hctx)
return false;
if (rq->mq_hctx->type != HCTX_TYPE_POLL)
return false;
if (WARN_ON_ONCE(!rq->bio))
return false;
return true;
}
static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
{
do {
blk_poll(rq->q, request_to_qc_t(rq->mq_hctx, rq), true);
bio_poll(rq->bio, NULL, 0);
cond_resched();
} while (!completion_done(wait));
}

View File

@ -379,7 +379,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
* @rq is being submitted. Analyze what needs to be done and put it on the
* right queue.
*/
void blk_insert_flush(struct request *rq)
bool blk_insert_flush(struct request *rq)
{
struct request_queue *q = rq->q;
unsigned long fflags = q->queue_flags; /* may change, cache */
@ -409,7 +409,7 @@ void blk_insert_flush(struct request *rq)
*/
if (!policy) {
blk_mq_end_request(rq, 0);
return;
return true;
}
BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
@ -420,10 +420,8 @@ void blk_insert_flush(struct request *rq)
* for normal execution.
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
blk_mq_request_bypass_insert(rq, false, false);
return;
}
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH)))
return false;
/*
* @rq should go through flush machinery. Mark it part of flush
@ -439,6 +437,8 @@ void blk_insert_flush(struct request *rq)
spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&fq->mq_flush_lock);
return true;
}
/**

348
block/blk-ia-ranges.c Normal file
View File

@ -0,0 +1,348 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Block device concurrent positioning ranges.
*
* Copyright (C) 2021 Western Digital Corporation or its Affiliates.
*/
#include <linux/kernel.h>
#include <linux/blkdev.h>
#include <linux/slab.h>
#include <linux/init.h>
#include "blk.h"
static ssize_t
blk_ia_range_sector_show(struct blk_independent_access_range *iar,
char *buf)
{
return sprintf(buf, "%llu\n", iar->sector);
}
static ssize_t
blk_ia_range_nr_sectors_show(struct blk_independent_access_range *iar,
char *buf)
{
return sprintf(buf, "%llu\n", iar->nr_sectors);
}
struct blk_ia_range_sysfs_entry {
struct attribute attr;
ssize_t (*show)(struct blk_independent_access_range *iar, char *buf);
};
static struct blk_ia_range_sysfs_entry blk_ia_range_sector_entry = {
.attr = { .name = "sector", .mode = 0444 },
.show = blk_ia_range_sector_show,
};
static struct blk_ia_range_sysfs_entry blk_ia_range_nr_sectors_entry = {
.attr = { .name = "nr_sectors", .mode = 0444 },
.show = blk_ia_range_nr_sectors_show,
};
static struct attribute *blk_ia_range_attrs[] = {
&blk_ia_range_sector_entry.attr,
&blk_ia_range_nr_sectors_entry.attr,
NULL,
};
ATTRIBUTE_GROUPS(blk_ia_range);
static ssize_t blk_ia_range_sysfs_show(struct kobject *kobj,
struct attribute *attr, char *buf)
{
struct blk_ia_range_sysfs_entry *entry =
container_of(attr, struct blk_ia_range_sysfs_entry, attr);
struct blk_independent_access_range *iar =
container_of(kobj, struct blk_independent_access_range, kobj);
ssize_t ret;
mutex_lock(&iar->queue->sysfs_lock);
ret = entry->show(iar, buf);
mutex_unlock(&iar->queue->sysfs_lock);
return ret;
}
static const struct sysfs_ops blk_ia_range_sysfs_ops = {
.show = blk_ia_range_sysfs_show,
};
/*
* Independent access range entries are not freed individually, but alltogether
* with struct blk_independent_access_ranges and its array of ranges. Since
* kobject_add() takes a reference on the parent kobject contained in
* struct blk_independent_access_ranges, the array of independent access range
* entries cannot be freed until kobject_del() is called for all entries.
* So we do not need to do anything here, but still need this no-op release
* operation to avoid complaints from the kobject code.
*/
static void blk_ia_range_sysfs_nop_release(struct kobject *kobj)
{
}
static struct kobj_type blk_ia_range_ktype = {
.sysfs_ops = &blk_ia_range_sysfs_ops,
.default_groups = blk_ia_range_groups,
.release = blk_ia_range_sysfs_nop_release,
};
/*
* This will be executed only after all independent access range entries are
* removed with kobject_del(), at which point, it is safe to free everything,
* including the array of ranges.
*/
static void blk_ia_ranges_sysfs_release(struct kobject *kobj)
{
struct blk_independent_access_ranges *iars =
container_of(kobj, struct blk_independent_access_ranges, kobj);
kfree(iars);
}
static struct kobj_type blk_ia_ranges_ktype = {
.release = blk_ia_ranges_sysfs_release,
};
/**
* disk_register_ia_ranges - register with sysfs a set of independent
* access ranges
* @disk: Target disk
* @new_iars: New set of independent access ranges
*
* Register with sysfs a set of independent access ranges for @disk.
* If @new_iars is not NULL, this set of ranges is registered and the old set
* specified by q->ia_ranges is unregistered. Otherwise, q->ia_ranges is
* registered if it is not already.
*/
int disk_register_independent_access_ranges(struct gendisk *disk,
struct blk_independent_access_ranges *new_iars)
{
struct request_queue *q = disk->queue;
struct blk_independent_access_ranges *iars;
int i, ret;
lockdep_assert_held(&q->sysfs_dir_lock);
lockdep_assert_held(&q->sysfs_lock);
/* If a new range set is specified, unregister the old one */
if (new_iars) {
if (q->ia_ranges)
disk_unregister_independent_access_ranges(disk);
q->ia_ranges = new_iars;
}
iars = q->ia_ranges;
if (!iars)
return 0;
/*
* At this point, iars is the new set of sector access ranges that needs
* to be registered with sysfs.
*/
WARN_ON(iars->sysfs_registered);
ret = kobject_init_and_add(&iars->kobj, &blk_ia_ranges_ktype,
&q->kobj, "%s", "independent_access_ranges");
if (ret) {
q->ia_ranges = NULL;
kfree(iars);
return ret;
}
for (i = 0; i < iars->nr_ia_ranges; i++) {
iars->ia_range[i].queue = q;
ret = kobject_init_and_add(&iars->ia_range[i].kobj,
&blk_ia_range_ktype, &iars->kobj,
"%d", i);
if (ret) {
while (--i >= 0)
kobject_del(&iars->ia_range[i].kobj);
kobject_del(&iars->kobj);
kobject_put(&iars->kobj);
return ret;
}
}
iars->sysfs_registered = true;
return 0;
}
void disk_unregister_independent_access_ranges(struct gendisk *disk)
{
struct request_queue *q = disk->queue;
struct blk_independent_access_ranges *iars = q->ia_ranges;
int i;
lockdep_assert_held(&q->sysfs_dir_lock);
lockdep_assert_held(&q->sysfs_lock);
if (!iars)
return;
if (iars->sysfs_registered) {
for (i = 0; i < iars->nr_ia_ranges; i++)
kobject_del(&iars->ia_range[i].kobj);
kobject_del(&iars->kobj);
kobject_put(&iars->kobj);
} else {
kfree(iars);
}
q->ia_ranges = NULL;
}
static struct blk_independent_access_range *
disk_find_ia_range(struct blk_independent_access_ranges *iars,
sector_t sector)
{
struct blk_independent_access_range *iar;
int i;
for (i = 0; i < iars->nr_ia_ranges; i++) {
iar = &iars->ia_range[i];
if (sector >= iar->sector &&
sector < iar->sector + iar->nr_sectors)
return iar;
}
return NULL;
}
static bool disk_check_ia_ranges(struct gendisk *disk,
struct blk_independent_access_ranges *iars)
{
struct blk_independent_access_range *iar, *tmp;
sector_t capacity = get_capacity(disk);
sector_t sector = 0;
int i;
/*
* While sorting the ranges in increasing LBA order, check that the
* ranges do not overlap, that there are no sector holes and that all
* sectors belong to one range.
*/
for (i = 0; i < iars->nr_ia_ranges; i++) {
tmp = disk_find_ia_range(iars, sector);
if (!tmp || tmp->sector != sector) {
pr_warn("Invalid non-contiguous independent access ranges\n");
return false;
}
iar = &iars->ia_range[i];
if (tmp != iar) {
swap(iar->sector, tmp->sector);
swap(iar->nr_sectors, tmp->nr_sectors);
}
sector += iar->nr_sectors;
}
if (sector != capacity) {
pr_warn("Independent access ranges do not match disk capacity\n");
return false;
}
return true;
}
static bool disk_ia_ranges_changed(struct gendisk *disk,
struct blk_independent_access_ranges *new)
{
struct blk_independent_access_ranges *old = disk->queue->ia_ranges;
int i;
if (!old)
return true;
if (old->nr_ia_ranges != new->nr_ia_ranges)
return true;
for (i = 0; i < old->nr_ia_ranges; i++) {
if (new->ia_range[i].sector != old->ia_range[i].sector ||
new->ia_range[i].nr_sectors != old->ia_range[i].nr_sectors)
return true;
}
return false;
}
/**
* disk_alloc_independent_access_ranges - Allocate an independent access ranges
* data structure
* @disk: target disk
* @nr_ia_ranges: Number of independent access ranges
*
* Allocate a struct blk_independent_access_ranges structure with @nr_ia_ranges
* access range descriptors.
*/
struct blk_independent_access_ranges *
disk_alloc_independent_access_ranges(struct gendisk *disk, int nr_ia_ranges)
{
struct blk_independent_access_ranges *iars;
iars = kzalloc_node(struct_size(iars, ia_range, nr_ia_ranges),
GFP_KERNEL, disk->queue->node);
if (iars)
iars->nr_ia_ranges = nr_ia_ranges;
return iars;
}
EXPORT_SYMBOL_GPL(disk_alloc_independent_access_ranges);
/**
* disk_set_independent_access_ranges - Set a disk independent access ranges
* @disk: target disk
* @iars: independent access ranges structure
*
* Set the independent access ranges information of the request queue
* of @disk to @iars. If @iars is NULL and the independent access ranges
* structure already set is cleared. If there are no differences between
* @iars and the independent access ranges structure already set, @iars
* is freed.
*/
void disk_set_independent_access_ranges(struct gendisk *disk,
struct blk_independent_access_ranges *iars)
{
struct request_queue *q = disk->queue;
if (WARN_ON_ONCE(iars && !iars->nr_ia_ranges)) {
kfree(iars);
iars = NULL;
}
mutex_lock(&q->sysfs_dir_lock);
mutex_lock(&q->sysfs_lock);
if (iars) {
if (!disk_check_ia_ranges(disk, iars)) {
kfree(iars);
iars = NULL;
goto reg;
}
if (!disk_ia_ranges_changed(disk, iars)) {
kfree(iars);
goto unlock;
}
}
/*
* This may be called for a registered queue. E.g. during a device
* revalidation. If that is the case, we need to unregister the old
* set of independent access ranges and register the new set. If the
* queue is not registered, registration of the device request queue
* will register the independent access ranges, so only swap in the
* new set and free the old one.
*/
reg:
if (blk_queue_registered(q)) {
disk_register_independent_access_ranges(disk, iars);
} else {
swap(q->ia_ranges, iars);
kfree(iars);
}
unlock:
mutex_unlock(&q->sysfs_lock);
mutex_unlock(&q->sysfs_dir_lock);
}
EXPORT_SYMBOL_GPL(disk_set_independent_access_ranges);

View File

@ -6,7 +6,7 @@
* Written by: Martin K. Petersen <martin.petersen@oracle.com>
*/
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/backing-dev.h>
#include <linux/mempool.h>
#include <linux/bio.h>
@ -409,9 +409,9 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (disk->queue->ksm) {
if (disk->queue->crypto_profile) {
pr_warn("blk-integrity: Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
blk_ksm_unregister(disk->queue);
blk_crypto_unregister(disk->queue);
}
#endif
}

View File

@ -3165,12 +3165,12 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
if (IS_ERR(bdev))
return PTR_ERR(bdev);
ioc = q_to_ioc(bdev->bd_disk->queue);
ioc = q_to_ioc(bdev_get_queue(bdev));
if (!ioc) {
ret = blk_iocost_init(bdev->bd_disk->queue);
ret = blk_iocost_init(bdev_get_queue(bdev));
if (ret)
goto err;
ioc = q_to_ioc(bdev->bd_disk->queue);
ioc = q_to_ioc(bdev_get_queue(bdev));
}
spin_lock_irq(&ioc->lock);
@ -3332,12 +3332,12 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
if (IS_ERR(bdev))
return PTR_ERR(bdev);
ioc = q_to_ioc(bdev->bd_disk->queue);
ioc = q_to_ioc(bdev_get_queue(bdev));
if (!ioc) {
ret = blk_iocost_init(bdev->bd_disk->queue);
ret = blk_iocost_init(bdev_get_queue(bdev));
if (ret)
goto err;
ioc = q_to_ioc(bdev->bd_disk->queue);
ioc = q_to_ioc(bdev_get_queue(bdev));
}
spin_lock_irq(&ioc->lock);

View File

@ -74,6 +74,7 @@
#include <linux/sched/signal.h>
#include <trace/events/block.h>
#include <linux/blk-mq.h>
#include <linux/blk-cgroup.h>
#include "blk-rq-qos.h"
#include "blk-stat.h"
#include "blk.h"

View File

@ -6,12 +6,45 @@
#include <linux/module.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/scatterlist.h>
#include <trace/events/block.h>
#include "blk.h"
#include "blk-rq-qos.h"
#include "blk-throttle.h"
static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
{
*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
}
static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
{
struct bvec_iter iter = bio->bi_iter;
int idx;
bio_get_first_bvec(bio, bv);
if (bv->bv_len == bio->bi_iter.bi_size)
return; /* this bio only has a single bvec */
bio_advance_iter(bio, &iter, iter.bi_size);
if (!iter.bi_bvec_done)
idx = iter.bi_idx - 1;
else /* in the middle of bvec */
idx = iter.bi_idx;
*bv = bio->bi_io_vec[idx];
/*
* iter.bi_bvec_done records actual length of the last bvec
* if this bio ends in the middle of one io vector
*/
if (iter.bi_bvec_done)
bv->bv_len = iter.bi_bvec_done;
}
static inline bool bio_will_gap(struct request_queue *q,
struct request *prev_rq, struct bio *prev, struct bio *next)
@ -285,13 +318,13 @@ split:
* iopoll in direct IO routine. Given performance gain of iopoll for
* big IO can be trival, disable iopoll when split needed.
*/
bio_clear_hipri(bio);
bio_clear_polled(bio);
return bio_split(bio, sectors, GFP_NOIO, bs);
}
/**
* __blk_queue_split - split a bio and submit the second half
* @q: [in] request_queue new bio is being queued at
* @bio: [in, out] bio to be split
* @nr_segs: [out] number of segments in the first bio
*
@ -302,9 +335,9 @@ split:
* of the caller to ensure that q->bio_split is only released after processing
* of the split bio has finished.
*/
void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
void __blk_queue_split(struct request_queue *q, struct bio **bio,
unsigned int *nr_segs)
{
struct request_queue *q = (*bio)->bi_bdev->bd_disk->queue;
struct bio *split = NULL;
switch (bio_op(*bio)) {
@ -321,21 +354,6 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
nr_segs);
break;
default:
/*
* All drivers must accept single-segments bios that are <=
* PAGE_SIZE. This is a quick and dirty check that relies on
* the fact that bi_io_vec[0] is always valid if a bio has data.
* The check might lead to occasional false negatives when bios
* are cloned, but compared to the performance impact of cloned
* bios themselves the loop below doesn't matter anyway.
*/
if (!q->limits.chunk_sectors &&
(*bio)->bi_vcnt == 1 &&
((*bio)->bi_io_vec[0].bv_len +
(*bio)->bi_io_vec[0].bv_offset) <= PAGE_SIZE) {
*nr_segs = 1;
break;
}
split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
break;
}
@ -365,9 +383,11 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
*/
void blk_queue_split(struct bio **bio)
{
struct request_queue *q = bdev_get_queue((*bio)->bi_bdev);
unsigned int nr_segs;
__blk_queue_split(bio, &nr_segs);
if (blk_may_split(q, *bio))
__blk_queue_split(q, bio, &nr_segs);
}
EXPORT_SYMBOL(blk_queue_split);
@ -558,6 +578,23 @@ static inline unsigned int blk_rq_get_max_segments(struct request *rq)
return queue_max_segments(rq->q);
}
static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
sector_t offset)
{
struct request_queue *q = rq->q;
if (blk_rq_is_passthrough(rq))
return q->limits.max_hw_sectors;
if (!q->limits.chunk_sectors ||
req_op(rq) == REQ_OP_DISCARD ||
req_op(rq) == REQ_OP_SECURE_ERASE)
return blk_queue_get_max_sectors(q, req_op(rq));
return min(blk_max_size_offset(q, offset, 0),
blk_queue_get_max_sectors(q, req_op(rq)));
}
static inline int ll_new_hw_segment(struct request *req, struct bio *bio,
unsigned int nr_phys_segs)
{
@ -718,6 +755,13 @@ static enum elv_merge blk_try_req_merge(struct request *req,
return ELEVATOR_NO_MERGE;
}
static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
{
if (bio_page(a) == bio_page(b) && bio_offset(a) == bio_offset(b))
return true;
return false;
}
/*
* For non-mq, this has to be called with the request spinlock acquired.
* For mq with scheduling, the appropriate queue wide lock should be held.
@ -1023,12 +1067,11 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
* @q: request_queue new bio is being queued at
* @bio: new bio being queued
* @nr_segs: number of segments in @bio
* @same_queue_rq: pointer to &struct request that gets filled in when
* another request associated with @q is found on the plug list
* (optional, may be %NULL)
* @same_queue_rq: output value, will be true if there's an existing request
* from the passed in @q already in the plug list
*
* Determine whether @bio being queued on @q can be merged with a request
* on %current's plugged list. Returns %true if merge was successful,
* Determine whether @bio being queued on @q can be merged with the previous
* request on %current's plugged list. Returns %true if merge was successful,
* otherwise %false.
*
* Plugging coalesces IOs from the same issuer for the same purpose without
@ -1041,36 +1084,26 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
* Caller must ensure !blk_queue_nomerges(q) beforehand.
*/
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **same_queue_rq)
unsigned int nr_segs, bool *same_queue_rq)
{
struct blk_plug *plug;
struct request *rq;
struct list_head *plug_list;
plug = blk_mq_plug(q, bio);
if (!plug)
if (!plug || rq_list_empty(plug->mq_list))
return false;
plug_list = &plug->mq_list;
list_for_each_entry_reverse(rq, plug_list, queuelist) {
if (rq->q == q && same_queue_rq) {
/*
* Only blk-mq multiple hardware queues case checks the
* rq in the same queue, there should be only one such
* rq in a queue
**/
*same_queue_rq = rq;
}
if (rq->q != q)
continue;
if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
BIO_MERGE_OK)
return true;
/* check the previously added entry for a quick merge attempt */
rq = rq_list_peek(&plug->mq_list);
if (rq->q == q) {
/*
* Only blk-mq multiple hardware queues case checks the rq in
* the same queue, there should be only one such rq in a queue
*/
*same_queue_rq = true;
}
if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) == BIO_MERGE_OK)
return true;
return false;
}

View File

@ -287,7 +287,7 @@ static const char *const cmd_flag_name[] = {
CMD_FLAG_NAME(BACKGROUND),
CMD_FLAG_NAME(NOWAIT),
CMD_FLAG_NAME(NOUNMAP),
CMD_FLAG_NAME(HIPRI),
CMD_FLAG_NAME(POLLED),
};
#undef CMD_FLAG_NAME
@ -453,11 +453,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
atomic_read(&tags->active_queues));
seq_puts(m, "\nbitmap_tags:\n");
sbitmap_queue_show(tags->bitmap_tags, m);
sbitmap_queue_show(&tags->bitmap_tags, m);
if (tags->nr_reserved_tags) {
seq_puts(m, "\nbreserved_tags:\n");
sbitmap_queue_show(tags->breserved_tags, m);
sbitmap_queue_show(&tags->breserved_tags, m);
}
}
@ -488,7 +488,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
if (res)
goto out;
if (hctx->tags)
sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);
out:
@ -522,77 +522,13 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
if (res)
goto out;
if (hctx->sched_tags)
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);
out:
return res;
}
static int hctx_io_poll_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
seq_printf(m, "considered=%lu\n", hctx->poll_considered);
seq_printf(m, "invoked=%lu\n", hctx->poll_invoked);
seq_printf(m, "success=%lu\n", hctx->poll_success);
return 0;
}
static ssize_t hctx_io_poll_write(void *data, const char __user *buf,
size_t count, loff_t *ppos)
{
struct blk_mq_hw_ctx *hctx = data;
hctx->poll_considered = hctx->poll_invoked = hctx->poll_success = 0;
return count;
}
static int hctx_dispatched_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
int i;
seq_printf(m, "%8u\t%lu\n", 0U, hctx->dispatched[0]);
for (i = 1; i < BLK_MQ_MAX_DISPATCH_ORDER - 1; i++) {
unsigned int d = 1U << (i - 1);
seq_printf(m, "%8u\t%lu\n", d, hctx->dispatched[i]);
}
seq_printf(m, "%8u+\t%lu\n", 1U << (i - 1), hctx->dispatched[i]);
return 0;
}
static ssize_t hctx_dispatched_write(void *data, const char __user *buf,
size_t count, loff_t *ppos)
{
struct blk_mq_hw_ctx *hctx = data;
int i;
for (i = 0; i < BLK_MQ_MAX_DISPATCH_ORDER; i++)
hctx->dispatched[i] = 0;
return count;
}
static int hctx_queued_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
seq_printf(m, "%lu\n", hctx->queued);
return 0;
}
static ssize_t hctx_queued_write(void *data, const char __user *buf,
size_t count, loff_t *ppos)
{
struct blk_mq_hw_ctx *hctx = data;
hctx->queued = 0;
return count;
}
static int hctx_run_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
@ -614,7 +550,7 @@ static int hctx_active_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
seq_printf(m, "%d\n", atomic_read(&hctx->nr_active));
seq_printf(m, "%d\n", __blk_mq_active_requests(hctx));
return 0;
}
@ -663,57 +599,6 @@ CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
static int ctx_dispatched_show(void *data, struct seq_file *m)
{
struct blk_mq_ctx *ctx = data;
seq_printf(m, "%lu %lu\n", ctx->rq_dispatched[1], ctx->rq_dispatched[0]);
return 0;
}
static ssize_t ctx_dispatched_write(void *data, const char __user *buf,
size_t count, loff_t *ppos)
{
struct blk_mq_ctx *ctx = data;
ctx->rq_dispatched[0] = ctx->rq_dispatched[1] = 0;
return count;
}
static int ctx_merged_show(void *data, struct seq_file *m)
{
struct blk_mq_ctx *ctx = data;
seq_printf(m, "%lu\n", ctx->rq_merged);
return 0;
}
static ssize_t ctx_merged_write(void *data, const char __user *buf,
size_t count, loff_t *ppos)
{
struct blk_mq_ctx *ctx = data;
ctx->rq_merged = 0;
return count;
}
static int ctx_completed_show(void *data, struct seq_file *m)
{
struct blk_mq_ctx *ctx = data;
seq_printf(m, "%lu %lu\n", ctx->rq_completed[1], ctx->rq_completed[0]);
return 0;
}
static ssize_t ctx_completed_write(void *data, const char __user *buf,
size_t count, loff_t *ppos)
{
struct blk_mq_ctx *ctx = data;
ctx->rq_completed[0] = ctx->rq_completed[1] = 0;
return count;
}
static int blk_mq_debugfs_show(struct seq_file *m, void *v)
{
const struct blk_mq_debugfs_attr *attr = m->private;
@ -789,9 +674,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"tags_bitmap", 0400, hctx_tags_bitmap_show},
{"sched_tags", 0400, hctx_sched_tags_show},
{"sched_tags_bitmap", 0400, hctx_sched_tags_bitmap_show},
{"io_poll", 0600, hctx_io_poll_show, hctx_io_poll_write},
{"dispatched", 0600, hctx_dispatched_show, hctx_dispatched_write},
{"queued", 0600, hctx_queued_show, hctx_queued_write},
{"run", 0600, hctx_run_show, hctx_run_write},
{"active", 0400, hctx_active_show},
{"dispatch_busy", 0400, hctx_dispatch_busy_show},
@ -803,9 +685,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
{"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
{"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
{"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
{"merged", 0600, ctx_merged_show, ctx_merged_write},
{"completed", 0600, ctx_completed_show, ctx_completed_write},
{},
};

View File

@ -57,10 +57,8 @@ void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
}
EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
void __blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
{
if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
return;
clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
/*
@ -363,7 +361,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
}
}
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs)
{
struct elevator_queue *e = q->elevator;
@ -389,13 +387,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
* potentially merge with. Currently includes a hand-wavy stop
* count of 8, to not spend too much time checking for merges.
*/
if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
ctx->rq_merged++;
if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs))
ret = true;
}
spin_unlock(&ctx->lock);
return ret;
}
@ -515,83 +510,71 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
percpu_ref_put(&q->q_usage_counter);
}
static int blk_mq_sched_alloc_tags(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
struct blk_mq_tag_set *set = q->tag_set;
int ret;
hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
set->reserved_tags, set->flags);
if (!hctx->sched_tags)
return -ENOMEM;
ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
if (ret) {
blk_mq_free_rq_map(hctx->sched_tags, set->flags);
hctx->sched_tags = NULL;
if (blk_mq_is_shared_tags(q->tag_set->flags)) {
hctx->sched_tags = q->sched_shared_tags;
return 0;
}
return ret;
hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
q->nr_requests);
if (!hctx->sched_tags)
return -ENOMEM;
return 0;
}
static void blk_mq_exit_sched_shared_tags(struct request_queue *queue)
{
blk_mq_free_rq_map(queue->sched_shared_tags);
queue->sched_shared_tags = NULL;
}
/* called in queue's release handler, tagset has gone away */
static void blk_mq_sched_tags_teardown(struct request_queue *q)
static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags)
{
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags) {
blk_mq_free_rq_map(hctx->sched_tags, hctx->flags);
if (!blk_mq_is_shared_tags(flags))
blk_mq_free_rq_map(hctx->sched_tags);
hctx->sched_tags = NULL;
}
}
if (blk_mq_is_shared_tags(flags))
blk_mq_exit_sched_shared_tags(q);
}
static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
static int blk_mq_init_sched_shared_tags(struct request_queue *queue)
{
struct blk_mq_tag_set *set = queue->tag_set;
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
struct blk_mq_hw_ctx *hctx;
int ret, i;
/*
* Set initial depth at max so that we don't need to reallocate for
* updating nr_requests.
*/
ret = blk_mq_init_bitmaps(&queue->sched_bitmap_tags,
&queue->sched_breserved_tags,
MAX_SCHED_RQ, set->reserved_tags,
set->numa_node, alloc_policy);
if (ret)
return ret;
queue->sched_shared_tags = blk_mq_alloc_map_and_rqs(set,
BLK_MQ_NO_HCTX_IDX,
MAX_SCHED_RQ);
if (!queue->sched_shared_tags)
return -ENOMEM;
queue_for_each_hw_ctx(queue, hctx, i) {
hctx->sched_tags->bitmap_tags =
&queue->sched_bitmap_tags;
hctx->sched_tags->breserved_tags =
&queue->sched_breserved_tags;
}
sbitmap_queue_resize(&queue->sched_bitmap_tags,
queue->nr_requests - set->reserved_tags);
blk_mq_tag_update_sched_shared_tags(queue);
return 0;
}
static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
{
sbitmap_queue_free(&queue->sched_bitmap_tags);
sbitmap_queue_free(&queue->sched_breserved_tags);
}
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
{
unsigned int i, flags = q->tag_set->flags;
struct blk_mq_hw_ctx *hctx;
struct elevator_queue *eq;
unsigned int i;
int ret;
if (!e) {
@ -606,23 +589,23 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
* Additionally, this is a per-hw queue depth.
*/
q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
BLKDEV_MAX_RQ);
BLKDEV_DEFAULT_RQ);
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_sched_alloc_tags(q, hctx, i);
if (blk_mq_is_shared_tags(flags)) {
ret = blk_mq_init_sched_shared_tags(q);
if (ret)
goto err_free_tags;
return ret;
}
if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
ret = blk_mq_init_sched_shared_sbitmap(q);
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
if (ret)
goto err_free_tags;
goto err_free_map_and_rqs;
}
ret = e->ops.init_sched(q, e);
if (ret)
goto err_free_sbitmap;
goto err_free_map_and_rqs;
blk_mq_debugfs_register_sched(q);
@ -631,7 +614,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
ret = e->ops.init_hctx(hctx, i);
if (ret) {
eq = q->elevator;
blk_mq_sched_free_requests(q);
blk_mq_sched_free_rqs(q);
blk_mq_exit_sched(q, eq);
kobject_put(&eq->kobj);
return ret;
@ -642,12 +625,10 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
return 0;
err_free_sbitmap:
if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
blk_mq_exit_sched_shared_sbitmap(q);
err_free_tags:
blk_mq_sched_free_requests(q);
blk_mq_sched_tags_teardown(q);
err_free_map_and_rqs:
blk_mq_sched_free_rqs(q);
blk_mq_sched_tags_teardown(q, flags);
q->elevator = NULL;
return ret;
}
@ -656,14 +637,20 @@ err_free_tags:
* called in either blk_queue_cleanup or elevator_switch, tagset
* is required for freeing requests
*/
void blk_mq_sched_free_requests(struct request_queue *q)
void blk_mq_sched_free_rqs(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags)
blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
if (blk_mq_is_shared_tags(q->tag_set->flags)) {
blk_mq_free_rqs(q->tag_set, q->sched_shared_tags,
BLK_MQ_NO_HCTX_IDX);
} else {
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags)
blk_mq_free_rqs(q->tag_set,
hctx->sched_tags, i);
}
}
}
@ -684,8 +671,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
blk_mq_debugfs_unregister_sched(q);
if (e->type->ops.exit_sched)
e->type->ops.exit_sched(e);
blk_mq_sched_tags_teardown(q);
if (blk_mq_is_sbitmap_shared(flags))
blk_mq_exit_sched_shared_sbitmap(q);
blk_mq_sched_tags_teardown(q, flags);
q->elevator = NULL;
}

View File

@ -2,21 +2,22 @@
#ifndef BLK_MQ_SCHED_H
#define BLK_MQ_SCHED_H
#include "elevator.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
#define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ)
#define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
void blk_mq_sched_assign_ioc(struct request *rq);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **merged_request);
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs);
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
struct list_head *free);
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
void __blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async);
@ -28,45 +29,51 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
void blk_mq_sched_free_requests(struct request_queue *q);
void blk_mq_sched_free_rqs(struct request_queue *q);
static inline bool
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs)
static inline void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
{
if (blk_queue_nomerges(q) || !bio_mergeable(bio))
return false;
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
__blk_mq_sched_restart(hctx);
}
return __blk_mq_sched_bio_merge(q, bio, nr_segs);
static inline bool bio_mergeable(struct bio *bio)
{
return !(bio->bi_opf & REQ_NOMERGE_FLAGS);
}
static inline bool
blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.allow_merge)
return e->type->ops.allow_merge(q, rq, bio);
if (rq->rq_flags & RQF_ELV) {
struct elevator_queue *e = q->elevator;
if (e->type->ops.allow_merge)
return e->type->ops.allow_merge(q, rq, bio);
}
return true;
}
static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
{
struct elevator_queue *e = rq->q->elevator;
if (rq->rq_flags & RQF_ELV) {
struct elevator_queue *e = rq->q->elevator;
if (e && e->type->ops.completed_request)
e->type->ops.completed_request(rq, now);
if (e->type->ops.completed_request)
e->type->ops.completed_request(rq, now);
}
}
static inline void blk_mq_sched_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (rq->rq_flags & RQF_ELV) {
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request)
e->type->ops.requeue_request(rq);
if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
e->type->ops.requeue_request(rq);
}
}
static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)

View File

@ -24,13 +24,12 @@
*/
bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;
struct blk_mq_tag_set *set = q->tag_set;
if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
!test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
atomic_inc(&set->active_queues_shared_sbitmap);
atomic_inc(&hctx->tags->active_queues);
} else {
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
!test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
@ -45,9 +44,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
*/
void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
{
sbitmap_queue_wake_all(tags->bitmap_tags);
sbitmap_queue_wake_all(&tags->bitmap_tags);
if (include_reserve)
sbitmap_queue_wake_all(tags->breserved_tags);
sbitmap_queue_wake_all(&tags->breserved_tags);
}
/*
@ -57,20 +56,20 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_tags *tags = hctx->tags;
struct request_queue *q = hctx->queue;
struct blk_mq_tag_set *set = q->tag_set;
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;
if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
&q->queue_flags))
return;
atomic_dec(&set->active_queues_shared_sbitmap);
} else {
if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return;
atomic_dec(&tags->active_queues);
}
atomic_dec(&tags->active_queues);
blk_mq_tag_wakeup_all(tags, false);
}
@ -87,6 +86,21 @@ static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
return __sbitmap_queue_get(bt);
}
unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
unsigned int *offset)
{
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
struct sbitmap_queue *bt = &tags->bitmap_tags;
unsigned long ret;
if (data->shallow_depth ||data->flags & BLK_MQ_REQ_RESERVED ||
data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
return 0;
ret = __sbitmap_queue_get_batch(bt, nr_tags, offset);
*offset += tags->nr_reserved_tags;
return ret;
}
unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
{
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
@ -101,10 +115,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
WARN_ON_ONCE(1);
return BLK_MQ_NO_TAG;
}
bt = tags->breserved_tags;
bt = &tags->breserved_tags;
tag_offset = 0;
} else {
bt = tags->bitmap_tags;
bt = &tags->bitmap_tags;
tag_offset = tags->nr_reserved_tags;
}
@ -150,9 +164,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
data->ctx);
tags = blk_mq_tags_from_data(data);
if (data->flags & BLK_MQ_REQ_RESERVED)
bt = tags->breserved_tags;
bt = &tags->breserved_tags;
else
bt = tags->bitmap_tags;
bt = &tags->bitmap_tags;
/*
* If destination hw queue is changed, fake wake up on
@ -186,13 +200,19 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
const int real_tag = tag - tags->nr_reserved_tags;
BUG_ON(real_tag >= tags->nr_tags);
sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
} else {
BUG_ON(tag >= tags->nr_reserved_tags);
sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
}
}
void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)
{
sbitmap_queue_clear_batch(&tags->bitmap_tags, tags->nr_reserved_tags,
tag_array, nr_tags);
}
struct bt_iter_data {
struct blk_mq_hw_ctx *hctx;
busy_iter_fn *fn;
@ -340,9 +360,9 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);
if (tags->nr_reserved_tags)
bt_tags_for_each(tags, tags->breserved_tags, fn, priv,
bt_tags_for_each(tags, &tags->breserved_tags, fn, priv,
flags | BT_TAG_ITER_RESERVED);
bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags);
bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, flags);
}
/**
@ -379,9 +399,12 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
{
int i;
unsigned int flags = tagset->flags;
int i, nr_tags;
for (i = 0; i < tagset->nr_hw_queues; i++) {
nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
for (i = 0; i < nr_tags; i++) {
if (tagset->tags && tagset->tags[i])
__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
BT_TAG_ITER_STARTED);
@ -459,8 +482,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
continue;
if (tags->nr_reserved_tags)
bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
}
blk_queue_exit(q);
}
@ -492,56 +515,10 @@ free_bitmap_tags:
return -ENOMEM;
}
static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
int node, int alloc_policy)
{
int ret;
ret = blk_mq_init_bitmaps(&tags->__bitmap_tags,
&tags->__breserved_tags,
tags->nr_tags, tags->nr_reserved_tags,
node, alloc_policy);
if (ret)
return ret;
tags->bitmap_tags = &tags->__bitmap_tags;
tags->breserved_tags = &tags->__breserved_tags;
return 0;
}
int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set)
{
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
int i, ret;
ret = blk_mq_init_bitmaps(&set->__bitmap_tags, &set->__breserved_tags,
set->queue_depth, set->reserved_tags,
set->numa_node, alloc_policy);
if (ret)
return ret;
for (i = 0; i < set->nr_hw_queues; i++) {
struct blk_mq_tags *tags = set->tags[i];
tags->bitmap_tags = &set->__bitmap_tags;
tags->breserved_tags = &set->__breserved_tags;
}
return 0;
}
void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
{
sbitmap_queue_free(&set->__bitmap_tags);
sbitmap_queue_free(&set->__breserved_tags);
}
struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
unsigned int reserved_tags,
int node, unsigned int flags)
int node, int alloc_policy)
{
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
struct blk_mq_tags *tags;
if (total_tags > BLK_MQ_TAG_MAX) {
@ -557,22 +534,19 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_reserved_tags = reserved_tags;
spin_lock_init(&tags->lock);
if (blk_mq_is_sbitmap_shared(flags))
return tags;
if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
if (blk_mq_init_bitmaps(&tags->bitmap_tags, &tags->breserved_tags,
total_tags, reserved_tags, node,
alloc_policy) < 0) {
kfree(tags);
return NULL;
}
return tags;
}
void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
void blk_mq_free_tags(struct blk_mq_tags *tags)
{
if (!blk_mq_is_sbitmap_shared(flags)) {
sbitmap_queue_free(tags->bitmap_tags);
sbitmap_queue_free(tags->breserved_tags);
}
sbitmap_queue_free(&tags->bitmap_tags);
sbitmap_queue_free(&tags->breserved_tags);
kfree(tags);
}
@ -592,7 +566,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
if (tdepth > tags->nr_tags) {
struct blk_mq_tag_set *set = hctx->queue->tag_set;
struct blk_mq_tags *new;
bool ret;
if (!can_grow)
return -EINVAL;
@ -604,34 +577,42 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
if (tdepth > MAX_SCHED_RQ)
return -EINVAL;
new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
tags->nr_reserved_tags, set->flags);
/*
* Only the sbitmap needs resizing since we allocated the max
* initially.
*/
if (blk_mq_is_shared_tags(set->flags))
return 0;
new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
if (!new)
return -ENOMEM;
ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
if (ret) {
blk_mq_free_rq_map(new, set->flags);
return -ENOMEM;
}
blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
blk_mq_free_rq_map(*tagsptr, set->flags);
blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
*tagsptr = new;
} else {
/*
* Don't need (or can't) update reserved tags here, they
* remain static and should never need resizing.
*/
sbitmap_queue_resize(tags->bitmap_tags,
sbitmap_queue_resize(&tags->bitmap_tags,
tdepth - tags->nr_reserved_tags);
}
return 0;
}
void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set, unsigned int size)
{
sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
struct blk_mq_tags *tags = set->shared_tags;
sbitmap_queue_resize(&tags->bitmap_tags, size - set->reserved_tags);
}
void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
{
sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
q->nr_requests - q->tag_set->reserved_tags);
}
/**

View File

@ -2,52 +2,30 @@
#ifndef INT_BLK_MQ_TAG_H
#define INT_BLK_MQ_TAG_H
/*
* Tag address space map.
*/
struct blk_mq_tags {
unsigned int nr_tags;
unsigned int nr_reserved_tags;
atomic_t active_queues;
struct sbitmap_queue *bitmap_tags;
struct sbitmap_queue *breserved_tags;
struct sbitmap_queue __bitmap_tags;
struct sbitmap_queue __breserved_tags;
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
/*
* used to clear request reference in rqs[] before freeing one
* request pool
*/
spinlock_t lock;
};
struct blk_mq_alloc_data;
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
unsigned int reserved_tags,
int node, unsigned int flags);
extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
int node, int alloc_policy);
extern void blk_mq_free_tags(struct blk_mq_tags *tags);
extern int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
struct sbitmap_queue *breserved_tags,
unsigned int queue_depth,
unsigned int reserved,
int node, int alloc_policy);
extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set);
extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
unsigned int *offset);
extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
unsigned int tag);
void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags);
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
extern void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
unsigned int size);
extern void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,

File diff suppressed because it is too large Load Diff

View File

@ -25,18 +25,14 @@ struct blk_mq_ctx {
unsigned short index_hw[HCTX_MAX_TYPES];
struct blk_mq_hw_ctx *hctxs[HCTX_MAX_TYPES];
/* incremented at dispatch time */
unsigned long rq_dispatched[2];
unsigned long rq_merged;
/* incremented at completion time */
unsigned long ____cacheline_aligned_in_smp rq_completed[2];
struct request_queue *queue;
struct blk_mq_ctxs *ctxs;
struct kobject kobj;
} ____cacheline_aligned_in_smp;
void blk_mq_submit_bio(struct bio *bio);
int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob,
unsigned int flags);
void blk_mq_exit_queue(struct request_queue *q);
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
@ -54,15 +50,12 @@ void blk_mq_put_rq_ref(struct request *rq);
*/
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx);
void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
unsigned int hctx_idx,
unsigned int nr_tags,
unsigned int reserved_tags,
unsigned int flags);
int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx, unsigned int depth);
void blk_mq_free_rq_map(struct blk_mq_tags *tags);
struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
unsigned int hctx_idx, unsigned int depth);
void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
struct blk_mq_tags *tags,
unsigned int hctx_idx);
/*
* Internal helpers for request insertion into sw queues
*/
@ -109,9 +102,9 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
enum hctx_type type = HCTX_TYPE_DEFAULT;
/*
* The caller ensure that if REQ_HIPRI, poll must be enabled.
* The caller ensure that if REQ_POLLED, poll must be enabled.
*/
if (flags & REQ_HIPRI)
if (flags & REQ_POLLED)
type = HCTX_TYPE_POLL;
else if ((flags & REQ_OP_MASK) == REQ_OP_READ)
type = HCTX_TYPE_READ;
@ -128,6 +121,8 @@ extern int __blk_mq_register_dev(struct device *dev, struct request_queue *q);
extern int blk_mq_sysfs_register(struct request_queue *q);
extern void blk_mq_sysfs_unregister(struct request_queue *q);
extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
void blk_mq_free_plug_rqs(struct blk_plug *plug);
void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule);
void blk_mq_release(struct request_queue *q);
@ -154,23 +149,27 @@ struct blk_mq_alloc_data {
blk_mq_req_flags_t flags;
unsigned int shallow_depth;
unsigned int cmd_flags;
unsigned int rq_flags;
/* allocate multiple requests/tags in one go */
unsigned int nr_tags;
struct request **cached_rq;
/* input & output parameter */
struct blk_mq_ctx *ctx;
struct blk_mq_hw_ctx *hctx;
};
static inline bool blk_mq_is_sbitmap_shared(unsigned int flags)
static inline bool blk_mq_is_shared_tags(unsigned int flags)
{
return flags & BLK_MQ_F_TAG_HCTX_SHARED;
}
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
{
if (data->q->elevator)
return data->hctx->sched_tags;
return data->hctx->tags;
if (!(data->rq_flags & RQF_ELV))
return data->hctx->tags;
return data->hctx->sched_tags;
}
static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
@ -220,24 +219,24 @@ static inline int blk_mq_get_rq_budget_token(struct request *rq)
static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags))
atomic_inc(&hctx->queue->nr_active_requests_shared_sbitmap);
if (blk_mq_is_shared_tags(hctx->flags))
atomic_inc(&hctx->queue->nr_active_requests_shared_tags);
else
atomic_inc(&hctx->nr_active);
}
static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags))
atomic_dec(&hctx->queue->nr_active_requests_shared_sbitmap);
if (blk_mq_is_shared_tags(hctx->flags))
atomic_dec(&hctx->queue->nr_active_requests_shared_tags);
else
atomic_dec(&hctx->nr_active);
}
static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags))
return atomic_read(&hctx->queue->nr_active_requests_shared_sbitmap);
if (blk_mq_is_shared_tags(hctx->flags))
return atomic_read(&hctx->queue->nr_active_requests_shared_tags);
return atomic_read(&hctx->nr_active);
}
static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
@ -260,7 +259,20 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
__blk_mq_put_driver_tag(rq->mq_hctx, rq);
}
bool blk_mq_get_driver_tag(struct request *rq);
bool __blk_mq_get_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq);
static inline bool blk_mq_get_driver_tag(struct request *rq)
{
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
if (rq->tag != BLK_MQ_NO_TAG &&
!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
hctx->tags->rqs[rq->tag] = rq;
return true;
}
return __blk_mq_get_driver_tag(hctx, rq);
}
static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
{
@ -331,19 +343,18 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
if (bt->sb.depth == 1)
return true;
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;
struct blk_mq_tag_set *set = q->tag_set;
if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
return true;
users = atomic_read(&set->active_queues_shared_sbitmap);
} else {
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return true;
users = atomic_read(&hctx->tags->active_queues);
}
users = atomic_read(&hctx->tags->active_queues);
if (!users)
return true;

View File

@ -189,9 +189,10 @@ static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
* BIO_TRACKED lets controllers know that a bio went through the
* normal rq_qos path.
*/
bio_set_flag(bio, BIO_TRACKED);
if (q->rq_qos)
if (q->rq_qos) {
bio_set_flag(bio, BIO_TRACKED);
__rq_qos_throttle(q->rq_qos, bio);
}
}
static inline void rq_qos_track(struct request_queue *q, struct request *rq,

View File

@ -17,6 +17,7 @@
#include "blk-mq.h"
#include "blk-mq-debugfs.h"
#include "blk-wbt.h"
#include "blk-throttle.h"
struct queue_sysfs_entry {
struct attribute attr;
@ -432,26 +433,11 @@ static ssize_t queue_poll_show(struct request_queue *q, char *page)
static ssize_t queue_poll_store(struct request_queue *q, const char *page,
size_t count)
{
unsigned long poll_on;
ssize_t ret;
if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
!q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
return -EINVAL;
ret = queue_var_store(&poll_on, page, count);
if (ret < 0)
return ret;
if (poll_on) {
blk_queue_flag_set(QUEUE_FLAG_POLL, q);
} else {
blk_mq_freeze_queue(q);
blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
blk_mq_unfreeze_queue(q);
}
return ret;
pr_info_ratelimited("writes to the poll attribute are ignored.\n");
pr_info_ratelimited("please use driver specific parameters instead.\n");
return count;
}
static ssize_t queue_io_timeout_show(struct request_queue *q, char *page)
@ -887,16 +873,15 @@ int blk_register_queue(struct gendisk *disk)
}
mutex_lock(&q->sysfs_lock);
ret = disk_register_independent_access_ranges(disk, NULL);
if (ret)
goto put_dev;
if (q->elevator) {
ret = elv_register_queue(q, false);
if (ret) {
mutex_unlock(&q->sysfs_lock);
mutex_unlock(&q->sysfs_dir_lock);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(dev);
kobject_put(&dev->kobj);
return ret;
}
if (ret)
goto put_dev;
}
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
@ -927,6 +912,16 @@ unlock:
percpu_ref_switch_to_percpu(&q->q_usage_counter);
}
return ret;
put_dev:
disk_unregister_independent_access_ranges(disk);
mutex_unlock(&q->sysfs_lock);
mutex_unlock(&q->sysfs_dir_lock);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(dev);
kobject_put(&dev->kobj);
return ret;
}
@ -972,6 +967,7 @@ void blk_unregister_queue(struct gendisk *disk)
mutex_lock(&q->sysfs_lock);
if (q->elevator)
elv_unregister_queue(q);
disk_unregister_independent_access_ranges(disk);
mutex_unlock(&q->sysfs_lock);
mutex_unlock(&q->sysfs_dir_lock);

View File

@ -13,6 +13,7 @@
#include <linux/blk-cgroup.h>
#include "blk.h"
#include "blk-cgroup-rwstat.h"
#include "blk-throttle.h"
/* Max dispatch from a group in 1 round */
#define THROTL_GRP_QUANTUM 8
@ -37,60 +38,9 @@
*/
#define LATENCY_FILTERED_HD (1000L) /* 1ms */
static struct blkcg_policy blkcg_policy_throtl;
/* A workqueue to queue throttle related work */
static struct workqueue_struct *kthrotld_workqueue;
/*
* To implement hierarchical throttling, throtl_grps form a tree and bios
* are dispatched upwards level by level until they reach the top and get
* issued. When dispatching bios from the children and local group at each
* level, if the bios are dispatched into a single bio_list, there's a risk
* of a local or child group which can queue many bios at once filling up
* the list starving others.
*
* To avoid such starvation, dispatched bios are queued separately
* according to where they came from. When they are again dispatched to
* the parent, they're popped in round-robin order so that no single source
* hogs the dispatch window.
*
* throtl_qnode is used to keep the queued bios separated by their sources.
* Bios are queued to throtl_qnode which in turn is queued to
* throtl_service_queue and then dispatched in round-robin order.
*
* It's also used to track the reference counts on blkg's. A qnode always
* belongs to a throtl_grp and gets queued on itself or the parent, so
* incrementing the reference of the associated throtl_grp when a qnode is
* queued and decrementing when dequeued is enough to keep the whole blkg
* tree pinned while bios are in flight.
*/
struct throtl_qnode {
struct list_head node; /* service_queue->queued[] */
struct bio_list bios; /* queued bios */
struct throtl_grp *tg; /* tg this qnode belongs to */
};
struct throtl_service_queue {
struct throtl_service_queue *parent_sq; /* the parent service_queue */
/*
* Bios queued directly to this service_queue or dispatched from
* children throtl_grp's.
*/
struct list_head queued[2]; /* throtl_qnode [READ/WRITE] */
unsigned int nr_queued[2]; /* number of queued bios */
/*
* RB tree of active children throtl_grp's, which are sorted by
* their ->disptime.
*/
struct rb_root_cached pending_tree; /* RB tree of active tgs */
unsigned int nr_pending; /* # queued in the tree */
unsigned long first_pending_disptime; /* disptime of the first tg */
struct timer_list pending_timer; /* fires on first_pending_disptime */
};
enum tg_state_flags {
THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */
THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */
@ -98,93 +48,6 @@ enum tg_state_flags {
#define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node)
enum {
LIMIT_LOW,
LIMIT_MAX,
LIMIT_CNT,
};
struct throtl_grp {
/* must be the first member */
struct blkg_policy_data pd;
/* active throtl group service_queue member */
struct rb_node rb_node;
/* throtl_data this group belongs to */
struct throtl_data *td;
/* this group's service queue */
struct throtl_service_queue service_queue;
/*
* qnode_on_self is used when bios are directly queued to this
* throtl_grp so that local bios compete fairly with bios
* dispatched from children. qnode_on_parent is used when bios are
* dispatched from this throtl_grp into its parent and will compete
* with the sibling qnode_on_parents and the parent's
* qnode_on_self.
*/
struct throtl_qnode qnode_on_self[2];
struct throtl_qnode qnode_on_parent[2];
/*
* Dispatch time in jiffies. This is the estimated time when group
* will unthrottle and is ready to dispatch more bio. It is used as
* key to sort active groups in service tree.
*/
unsigned long disptime;
unsigned int flags;
/* are there any throtl rules between this group and td? */
bool has_rules[2];
/* internally used bytes per second rate limits */
uint64_t bps[2][LIMIT_CNT];
/* user configured bps limits */
uint64_t bps_conf[2][LIMIT_CNT];
/* internally used IOPS limits */
unsigned int iops[2][LIMIT_CNT];
/* user configured IOPS limits */
unsigned int iops_conf[2][LIMIT_CNT];
/* Number of bytes dispatched in current slice */
uint64_t bytes_disp[2];
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
unsigned long last_low_overflow_time[2];
uint64_t last_bytes_disp[2];
unsigned int last_io_disp[2];
unsigned long last_check_time;
unsigned long latency_target; /* us */
unsigned long latency_target_conf; /* us */
/* When did we start a new slice */
unsigned long slice_start[2];
unsigned long slice_end[2];
unsigned long last_finish_time; /* ns / 1024 */
unsigned long checked_last_finish_time; /* ns / 1024 */
unsigned long avg_idletime; /* ns / 1024 */
unsigned long idletime_threshold; /* us */
unsigned long idletime_threshold_conf; /* us */
unsigned int bio_cnt; /* total bios */
unsigned int bad_bio_cnt; /* bios exceeding latency threshold */
unsigned long bio_cnt_reset_time;
atomic_t io_split_cnt[2];
atomic_t last_io_split_cnt[2];
struct blkg_rwstat stat_bytes;
struct blkg_rwstat stat_ios;
};
/* We measure latency for request size from <= 4k to >= 1M */
#define LATENCY_BUCKET_SIZE 9
@ -231,16 +94,6 @@ struct throtl_data
static void throtl_pending_timer_fn(struct timer_list *t);
static inline struct throtl_grp *pd_to_tg(struct blkg_policy_data *pd)
{
return pd ? container_of(pd, struct throtl_grp, pd) : NULL;
}
static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg)
{
return pd_to_tg(blkg_to_pd(blkg, &blkcg_policy_throtl));
}
static inline struct blkcg_gq *tg_to_blkg(struct throtl_grp *tg)
{
return pd_to_blkg(&tg->pd);
@ -1794,7 +1647,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
cancel_work_sync(&td->dispatch_work);
}
static struct blkcg_policy blkcg_policy_throtl = {
struct blkcg_policy blkcg_policy_throtl = {
.dfl_cftypes = throtl_files,
.legacy_cftypes = throtl_legacy_files,
@ -2208,9 +2061,9 @@ void blk_throtl_charge_bio_split(struct bio *bio)
} while (parent);
}
bool blk_throtl_bio(struct bio *bio)
bool __blk_throtl_bio(struct bio *bio)
{
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
struct blkcg_gq *blkg = bio->bi_blkg;
struct throtl_qnode *qn = NULL;
struct throtl_grp *tg = blkg_to_tg(blkg);
@ -2221,19 +2074,12 @@ bool blk_throtl_bio(struct bio *bio)
rcu_read_lock();
/* see throtl_charge_bio() */
if (bio_flagged(bio, BIO_THROTTLED))
goto out;
if (!cgroup_subsys_on_dfl(io_cgrp_subsys)) {
blkg_rwstat_add(&tg->stat_bytes, bio->bi_opf,
bio->bi_iter.bi_size);
blkg_rwstat_add(&tg->stat_ios, bio->bi_opf, 1);
}
if (!tg->has_rules[rw])
goto out;
spin_lock_irq(&q->queue_lock);
throtl_update_latency_buckets(td);
@ -2317,7 +2163,6 @@ again:
out_unlock:
spin_unlock_irq(&q->queue_lock);
out:
bio_set_flag(bio, BIO_THROTTLED);
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW

182
block/blk-throttle.h Normal file
View File

@ -0,0 +1,182 @@
#ifndef BLK_THROTTLE_H
#define BLK_THROTTLE_H
#include "blk-cgroup-rwstat.h"
/*
* To implement hierarchical throttling, throtl_grps form a tree and bios
* are dispatched upwards level by level until they reach the top and get
* issued. When dispatching bios from the children and local group at each
* level, if the bios are dispatched into a single bio_list, there's a risk
* of a local or child group which can queue many bios at once filling up
* the list starving others.
*
* To avoid such starvation, dispatched bios are queued separately
* according to where they came from. When they are again dispatched to
* the parent, they're popped in round-robin order so that no single source
* hogs the dispatch window.
*
* throtl_qnode is used to keep the queued bios separated by their sources.
* Bios are queued to throtl_qnode which in turn is queued to
* throtl_service_queue and then dispatched in round-robin order.
*
* It's also used to track the reference counts on blkg's. A qnode always
* belongs to a throtl_grp and gets queued on itself or the parent, so
* incrementing the reference of the associated throtl_grp when a qnode is
* queued and decrementing when dequeued is enough to keep the whole blkg
* tree pinned while bios are in flight.
*/
struct throtl_qnode {
struct list_head node; /* service_queue->queued[] */
struct bio_list bios; /* queued bios */
struct throtl_grp *tg; /* tg this qnode belongs to */
};
struct throtl_service_queue {
struct throtl_service_queue *parent_sq; /* the parent service_queue */
/*
* Bios queued directly to this service_queue or dispatched from
* children throtl_grp's.
*/
struct list_head queued[2]; /* throtl_qnode [READ/WRITE] */
unsigned int nr_queued[2]; /* number of queued bios */
/*
* RB tree of active children throtl_grp's, which are sorted by
* their ->disptime.
*/
struct rb_root_cached pending_tree; /* RB tree of active tgs */
unsigned int nr_pending; /* # queued in the tree */
unsigned long first_pending_disptime; /* disptime of the first tg */
struct timer_list pending_timer; /* fires on first_pending_disptime */
};
enum {
LIMIT_LOW,
LIMIT_MAX,
LIMIT_CNT,
};
struct throtl_grp {
/* must be the first member */
struct blkg_policy_data pd;
/* active throtl group service_queue member */
struct rb_node rb_node;
/* throtl_data this group belongs to */
struct throtl_data *td;
/* this group's service queue */
struct throtl_service_queue service_queue;
/*
* qnode_on_self is used when bios are directly queued to this
* throtl_grp so that local bios compete fairly with bios
* dispatched from children. qnode_on_parent is used when bios are
* dispatched from this throtl_grp into its parent and will compete
* with the sibling qnode_on_parents and the parent's
* qnode_on_self.
*/
struct throtl_qnode qnode_on_self[2];
struct throtl_qnode qnode_on_parent[2];
/*
* Dispatch time in jiffies. This is the estimated time when group
* will unthrottle and is ready to dispatch more bio. It is used as
* key to sort active groups in service tree.
*/
unsigned long disptime;
unsigned int flags;
/* are there any throtl rules between this group and td? */
bool has_rules[2];
/* internally used bytes per second rate limits */
uint64_t bps[2][LIMIT_CNT];
/* user configured bps limits */
uint64_t bps_conf[2][LIMIT_CNT];
/* internally used IOPS limits */
unsigned int iops[2][LIMIT_CNT];
/* user configured IOPS limits */
unsigned int iops_conf[2][LIMIT_CNT];
/* Number of bytes dispatched in current slice */
uint64_t bytes_disp[2];
/* Number of bio's dispatched in current slice */
unsigned int io_disp[2];
unsigned long last_low_overflow_time[2];
uint64_t last_bytes_disp[2];
unsigned int last_io_disp[2];
unsigned long last_check_time;
unsigned long latency_target; /* us */
unsigned long latency_target_conf; /* us */
/* When did we start a new slice */
unsigned long slice_start[2];
unsigned long slice_end[2];
unsigned long last_finish_time; /* ns / 1024 */
unsigned long checked_last_finish_time; /* ns / 1024 */
unsigned long avg_idletime; /* ns / 1024 */
unsigned long idletime_threshold; /* us */
unsigned long idletime_threshold_conf; /* us */
unsigned int bio_cnt; /* total bios */
unsigned int bad_bio_cnt; /* bios exceeding latency threshold */
unsigned long bio_cnt_reset_time;
atomic_t io_split_cnt[2];
atomic_t last_io_split_cnt[2];
struct blkg_rwstat stat_bytes;
struct blkg_rwstat stat_ios;
};
extern struct blkcg_policy blkcg_policy_throtl;
static inline struct throtl_grp *pd_to_tg(struct blkg_policy_data *pd)
{
return pd ? container_of(pd, struct throtl_grp, pd) : NULL;
}
static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg)
{
return pd_to_tg(blkg_to_pd(blkg, &blkcg_policy_throtl));
}
/*
* Internal throttling interface
*/
#ifndef CONFIG_BLK_DEV_THROTTLING
static inline int blk_throtl_init(struct request_queue *q) { return 0; }
static inline void blk_throtl_exit(struct request_queue *q) { }
static inline void blk_throtl_register_queue(struct request_queue *q) { }
static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
static inline bool blk_throtl_bio(struct bio *bio) { return false; }
#else /* CONFIG_BLK_DEV_THROTTLING */
int blk_throtl_init(struct request_queue *q);
void blk_throtl_exit(struct request_queue *q);
void blk_throtl_register_queue(struct request_queue *q);
void blk_throtl_charge_bio_split(struct bio *bio);
bool __blk_throtl_bio(struct bio *bio);
static inline bool blk_throtl_bio(struct bio *bio)
{
struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
if (bio_flagged(bio, BIO_THROTTLED))
return false;
if (!tg->has_rules[bio_data_dir(bio)])
return false;
return __blk_throtl_bio(bio);
}
#endif /* CONFIG_BLK_DEV_THROTTLING */
#endif

View File

@ -357,6 +357,9 @@ static void wb_timer_fn(struct blk_stat_callback *cb)
unsigned int inflight = wbt_inflight(rwb);
int status;
if (!rwb->rqos.q->disk)
return;
status = latency_exceeded(rwb, cb->stat);
trace_wbt_timer(rwb->rqos.q->disk->bdi, status, rqd->scale_step,

View File

@ -12,6 +12,8 @@
#include "blk-mq.h"
#include "blk-mq-sched.h"
struct elevator_type;
/* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ)
@ -94,6 +96,44 @@ static inline bool bvec_gap_to_prev(struct request_queue *q,
return __bvec_gap_to_prev(q, bprv, offset);
}
static inline bool rq_mergeable(struct request *rq)
{
if (blk_rq_is_passthrough(rq))
return false;
if (req_op(rq) == REQ_OP_FLUSH)
return false;
if (req_op(rq) == REQ_OP_WRITE_ZEROES)
return false;
if (req_op(rq) == REQ_OP_ZONE_APPEND)
return false;
if (rq->cmd_flags & REQ_NOMERGE_FLAGS)
return false;
if (rq->rq_flags & RQF_NOMERGE_FLAGS)
return false;
return true;
}
/*
* There are two different ways to handle DISCARD merges:
* 1) If max_discard_segments > 1, the driver treats every bio as a range and
* send the bios to controller together. The ranges don't need to be
* contiguous.
* 2) Otherwise, the request will be normal read/write requests. The ranges
* need to be contiguous.
*/
static inline bool blk_discard_mergable(struct request *req)
{
if (req_op(req) == REQ_OP_DISCARD &&
queue_max_discard_segments(req->q) > 1)
return true;
return false;
}
#ifdef CONFIG_BLK_DEV_INTEGRITY
void blk_flush_integrity(void);
bool __bio_integrity_endio(struct bio *);
@ -175,21 +215,28 @@ static inline void blk_integrity_del(struct gendisk *disk)
unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req);
void blk_print_req_error(struct request *req, blk_status_t status);
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **same_queue_rq);
unsigned int nr_segs, bool *same_queue_rq);
bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
struct bio *bio, unsigned int nr_segs);
void blk_account_io_start(struct request *req);
void blk_account_io_done(struct request *req, u64 now);
void __blk_account_io_start(struct request *req);
void __blk_account_io_done(struct request *req, u64 now);
/*
* Plug flush limits
*/
#define BLK_MAX_REQUEST_COUNT 32
#define BLK_PLUG_FLUSH_SIZE (128 * 1024)
/*
* Internal elevator interface
*/
#define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
void blk_insert_flush(struct request *rq);
bool blk_insert_flush(struct request *rq);
int elevator_switch_mq(struct request_queue *q,
struct elevator_type *new_e);
@ -202,7 +249,7 @@ static inline void elevator_exit(struct request_queue *q,
{
lockdep_assert_held(&q->sysfs_lock);
blk_mq_sched_free_requests(q);
blk_mq_sched_free_rqs(q);
__elevator_exit(q, e);
}
@ -220,7 +267,32 @@ ssize_t part_timeout_show(struct device *, struct device_attribute *, char *);
ssize_t part_timeout_store(struct device *, struct device_attribute *,
const char *, size_t);
void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
{
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
case REQ_OP_WRITE_ZEROES:
case REQ_OP_WRITE_SAME:
return true; /* non-trivial splitting decisions */
default:
break;
}
/*
* All drivers must accept single-segments bios that are <= PAGE_SIZE.
* This is a quick and dirty check that relies on the fact that
* bi_io_vec[0] is always valid if a bio has data. The check might
* lead to occasional false negatives when bios are cloned, but compared
* to the performance impact of cloned bios themselves the loop below
* doesn't matter anyway.
*/
return q->limits.chunk_sectors || bio->bi_vcnt != 1 ||
bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
}
void __blk_queue_split(struct request_queue *q, struct bio **bio,
unsigned int *nr_segs);
int ll_back_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs);
bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
@ -240,7 +312,25 @@ int blk_dev_init(void);
*/
static inline bool blk_do_io_stat(struct request *rq)
{
return rq->rq_disk && (rq->rq_flags & RQF_IO_STAT);
return (rq->rq_flags & RQF_IO_STAT) && rq->rq_disk;
}
static inline void blk_account_io_done(struct request *req, u64 now)
{
/*
* Account IO completion. flush_rq isn't accounted as a
* normal IO on queueing nor completion. Accounting the
* containing request is enough.
*/
if (blk_do_io_stat(req) && req->part &&
!(req->rq_flags & RQF_FLUSH_SEQ))
__blk_account_io_done(req, now);
}
static inline void blk_account_io_start(struct request *req)
{
if (blk_do_io_stat(req))
__blk_account_io_start(req);
}
static inline void req_set_nomerge(struct request_queue *q, struct request *req)
@ -285,22 +375,6 @@ void ioc_clear_queue(struct request_queue *q);
int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node);
/*
* Internal throttling interface
*/
#ifdef CONFIG_BLK_DEV_THROTTLING
extern int blk_throtl_init(struct request_queue *q);
extern void blk_throtl_exit(struct request_queue *q);
extern void blk_throtl_register_queue(struct request_queue *q);
extern void blk_throtl_charge_bio_split(struct bio *bio);
bool blk_throtl_bio(struct bio *bio);
#else /* CONFIG_BLK_DEV_THROTTLING */
static inline int blk_throtl_init(struct request_queue *q) { return 0; }
static inline void blk_throtl_exit(struct request_queue *q) { }
static inline void blk_throtl_register_queue(struct request_queue *q) { }
static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
static inline bool blk_throtl_bio(struct bio *bio) { return false; }
#endif /* CONFIG_BLK_DEV_THROTTLING */
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
extern ssize_t blk_throtl_sample_time_show(struct request_queue *q, char *page);
extern ssize_t blk_throtl_sample_time_store(struct request_queue *q,
@ -368,13 +442,20 @@ extern struct device_attribute dev_attr_events;
extern struct device_attribute dev_attr_events_async;
extern struct device_attribute dev_attr_events_poll_msecs;
static inline void bio_clear_hipri(struct bio *bio)
static inline void bio_clear_polled(struct bio *bio)
{
/* can't support alloc cache if we turn off polling */
bio_clear_flag(bio, BIO_PERCPU_CACHE);
bio->bi_opf &= ~REQ_HIPRI;
bio->bi_opf &= ~REQ_POLLED;
}
long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
extern const struct address_space_operations def_blk_aops;
int disk_register_independent_access_ranges(struct gendisk *disk,
struct blk_independent_access_ranges *new_iars);
void disk_unregister_independent_access_ranges(struct gendisk *disk);
#endif /* BLK_INTERNAL_H */

View File

@ -14,6 +14,7 @@
#include <linux/pagemap.h>
#include <linux/mempool.h>
#include <linux/blkdev.h>
#include <linux/blk-cgroup.h>
#include <linux/backing-dev.h>
#include <linux/init.h>
#include <linux/hash.h>

View File

@ -26,7 +26,6 @@
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/bio.h>
#include <linux/module.h>
#include <linux/slab.h>
@ -40,6 +39,7 @@
#include <trace/events/block.h>
#include "elevator.h"
#include "blk.h"
#include "blk-mq-sched.h"
#include "blk-pm.h"
@ -637,7 +637,7 @@ static struct elevator_type *elevator_get_default(struct request_queue *q)
return NULL;
if (q->nr_hw_queues != 1 &&
!blk_mq_is_sbitmap_shared(q->tag_set->flags))
!blk_mq_is_shared_tags(q->tag_set->flags))
return NULL;
return elevator_get(q, "mq-deadline", false);

View File

@ -1,17 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_ELEVATOR_H
#define _LINUX_ELEVATOR_H
#ifndef _ELEVATOR_H
#define _ELEVATOR_H
#include <linux/percpu.h>
#include <linux/hashtable.h>
#ifdef CONFIG_BLOCK
struct io_cq;
struct elevator_type;
#ifdef CONFIG_BLK_DEBUG_FS
struct blk_mq_debugfs_attr;
#endif
/*
* Return values from elevator merger
@ -162,20 +158,9 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t);
#define ELEVATOR_INSERT_FLUSH 5
#define ELEVATOR_INSERT_SORT_MERGE 6
#define rq_end_sector(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq))
#define rb_entry_rq(node) rb_entry((node), struct request, rb_node)
#define rq_entry_fifo(ptr) list_entry((ptr), struct request, queuelist)
#define rq_fifo_clear(rq) list_del_init(&(rq)->queuelist)
/*
* Elevator features.
*/
/* Supports zoned block devices sequential write constraint */
#define ELEVATOR_F_ZBD_SEQ_WRITE (1U << 0)
/* Supports scheduling on multiple hardware queues */
#define ELEVATOR_F_MQ_AWARE (1U << 1)
#endif /* CONFIG_BLOCK */
#endif
#endif /* _ELEVATOR_H */

View File

@ -17,7 +17,7 @@
#include <linux/fs.h>
#include "blk.h"
static struct inode *bdev_file_inode(struct file *file)
static inline struct inode *bdev_file_inode(struct file *file)
{
return file->f_mapping->host;
}
@ -54,14 +54,12 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
struct iov_iter *iter, unsigned int nr_pages)
{
struct file *file = iocb->ki_filp;
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
struct block_device *bdev = iocb->ki_filp->private_data;
struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs;
loff_t pos = iocb->ki_pos;
bool should_dirty = false;
struct bio bio;
ssize_t ret;
blk_qc_t qc;
if ((pos | iov_iter_alignment(iter)) &
(bdev_logical_block_size(bdev) - 1))
@ -78,7 +76,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
bio_init(&bio, vecs, nr_pages);
bio_set_dev(&bio, bdev);
bio.bi_iter.bi_sector = pos >> 9;
bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
bio.bi_write_hint = iocb->ki_hint;
bio.bi_private = current;
bio.bi_end_io = blkdev_bio_end_io_simple;
@ -102,13 +100,12 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
if (iocb->ki_flags & IOCB_HIPRI)
bio_set_polled(&bio, iocb);
qc = submit_bio(&bio);
submit_bio(&bio);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!READ_ONCE(bio.bi_private))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_poll(bdev_get_queue(bdev), qc, true))
if (!(iocb->ki_flags & IOCB_HIPRI) || !bio_poll(&bio, NULL, 0))
blk_io_schedule();
}
__set_current_state(TASK_RUNNING);
@ -126,6 +123,11 @@ out:
return ret;
}
enum {
DIO_SHOULD_DIRTY = 1,
DIO_IS_SYNC = 2,
};
struct blkdev_dio {
union {
struct kiocb *iocb;
@ -133,35 +135,27 @@ struct blkdev_dio {
};
size_t size;
atomic_t ref;
bool multi_bio : 1;
bool should_dirty : 1;
bool is_sync : 1;
struct bio bio;
unsigned int flags;
struct bio bio ____cacheline_aligned_in_smp;
};
static struct bio_set blkdev_dio_pool;
static int blkdev_iopoll(struct kiocb *kiocb, bool wait)
{
struct block_device *bdev = I_BDEV(kiocb->ki_filp->f_mapping->host);
struct request_queue *q = bdev_get_queue(bdev);
return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait);
}
static void blkdev_bio_end_io(struct bio *bio)
{
struct blkdev_dio *dio = bio->bi_private;
bool should_dirty = dio->should_dirty;
bool should_dirty = dio->flags & DIO_SHOULD_DIRTY;
if (bio->bi_status && !dio->bio.bi_status)
dio->bio.bi_status = bio->bi_status;
if (!dio->multi_bio || atomic_dec_and_test(&dio->ref)) {
if (!dio->is_sync) {
if (atomic_dec_and_test(&dio->ref)) {
if (!(dio->flags & DIO_IS_SYNC)) {
struct kiocb *iocb = dio->iocb;
ssize_t ret;
WRITE_ONCE(iocb->private, NULL);
if (likely(!dio->bio.bi_status)) {
ret = dio->size;
iocb->ki_pos += ret;
@ -170,8 +164,7 @@ static void blkdev_bio_end_io(struct bio *bio)
}
dio->iocb->ki_complete(iocb, ret, 0);
if (dio->multi_bio)
bio_put(&dio->bio);
bio_put(&dio->bio);
} else {
struct task_struct *waiter = dio->waiter;
@ -191,16 +184,12 @@ static void blkdev_bio_end_io(struct bio *bio)
static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
unsigned int nr_pages)
{
struct file *file = iocb->ki_filp;
struct inode *inode = bdev_file_inode(file);
struct block_device *bdev = I_BDEV(inode);
struct block_device *bdev = iocb->ki_filp->private_data;
struct blk_plug plug;
struct blkdev_dio *dio;
struct bio *bio;
bool is_poll = (iocb->ki_flags & IOCB_HIPRI) != 0;
bool is_read = (iov_iter_rw(iter) == READ), is_sync;
loff_t pos = iocb->ki_pos;
blk_qc_t qc = BLK_QC_T_NONE;
int ret = 0;
if ((pos | iov_iter_alignment(iter)) &
@ -210,28 +199,31 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
dio = container_of(bio, struct blkdev_dio, bio);
dio->is_sync = is_sync = is_sync_kiocb(iocb);
if (dio->is_sync) {
atomic_set(&dio->ref, 1);
/*
* Grab an extra reference to ensure the dio structure which is embedded
* into the first bio stays around.
*/
bio_get(bio);
is_sync = is_sync_kiocb(iocb);
if (is_sync) {
dio->flags = DIO_IS_SYNC;
dio->waiter = current;
bio_get(bio);
} else {
dio->flags = 0;
dio->iocb = iocb;
}
dio->size = 0;
dio->multi_bio = false;
dio->should_dirty = is_read && iter_is_iovec(iter);
if (is_read && iter_is_iovec(iter))
dio->flags |= DIO_SHOULD_DIRTY;
/*
* Don't plug for HIPRI/polled IO, as those should go straight
* to issue
*/
if (!is_poll)
blk_start_plug(&plug);
blk_start_plug(&plug);
for (;;) {
bio_set_dev(bio, bdev);
bio->bi_iter.bi_sector = pos >> 9;
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
bio->bi_write_hint = iocb->ki_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
@ -246,7 +238,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
if (is_read) {
bio->bi_opf = REQ_OP_READ;
if (dio->should_dirty)
if (dio->flags & DIO_SHOULD_DIRTY)
bio_set_pages_dirty(bio);
} else {
bio->bi_opf = dio_bio_write_op(iocb);
@ -260,40 +252,15 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS);
if (!nr_pages) {
bool polled = false;
if (iocb->ki_flags & IOCB_HIPRI) {
bio_set_polled(bio, iocb);
polled = true;
}
qc = submit_bio(bio);
if (polled)
WRITE_ONCE(iocb->ki_cookie, qc);
submit_bio(bio);
break;
}
if (!dio->multi_bio) {
/*
* AIO needs an extra reference to ensure the dio
* structure which is embedded into the first bio
* stays around.
*/
if (!is_sync)
bio_get(bio);
dio->multi_bio = true;
atomic_set(&dio->ref, 2);
} else {
atomic_inc(&dio->ref);
}
atomic_inc(&dio->ref);
submit_bio(bio);
bio = bio_alloc(GFP_KERNEL, nr_pages);
}
if (!is_poll)
blk_finish_plug(&plug);
blk_finish_plug(&plug);
if (!is_sync)
return -EIOCBQUEUED;
@ -302,10 +269,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
set_current_state(TASK_UNINTERRUPTIBLE);
if (!READ_ONCE(dio->waiter))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_poll(bdev_get_queue(bdev), qc, true))
blk_io_schedule();
blk_io_schedule();
}
__set_current_state(TASK_RUNNING);
@ -318,6 +282,94 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
return ret;
}
static void blkdev_bio_end_io_async(struct bio *bio)
{
struct blkdev_dio *dio = container_of(bio, struct blkdev_dio, bio);
struct kiocb *iocb = dio->iocb;
ssize_t ret;
if (likely(!bio->bi_status)) {
ret = dio->size;
iocb->ki_pos += ret;
} else {
ret = blk_status_to_errno(bio->bi_status);
}
iocb->ki_complete(iocb, ret, 0);
if (dio->flags & DIO_SHOULD_DIRTY) {
bio_check_pages_dirty(bio);
} else {
bio_release_pages(bio, false);
bio_put(bio);
}
}
static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
struct iov_iter *iter,
unsigned int nr_pages)
{
struct block_device *bdev = iocb->ki_filp->private_data;
struct blkdev_dio *dio;
struct bio *bio;
loff_t pos = iocb->ki_pos;
int ret = 0;
if ((pos | iov_iter_alignment(iter)) &
(bdev_logical_block_size(bdev) - 1))
return -EINVAL;
bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
dio = container_of(bio, struct blkdev_dio, bio);
dio->flags = 0;
dio->iocb = iocb;
bio_set_dev(bio, bdev);
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
bio->bi_write_hint = iocb->ki_hint;
bio->bi_end_io = blkdev_bio_end_io_async;
bio->bi_ioprio = iocb->ki_ioprio;
if (iov_iter_is_bvec(iter)) {
/*
* Users don't rely on the iterator being in any particular
* state for async I/O returning -EIOCBQUEUED, hence we can
* avoid expensive iov_iter_advance(). Bypass
* bio_iov_iter_get_pages() and set the bvec directly.
*/
bio_iov_bvec_set(bio, iter);
} else {
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
return ret;
}
}
dio->size = bio->bi_iter.bi_size;
if (iov_iter_rw(iter) == READ) {
bio->bi_opf = REQ_OP_READ;
if (iter_is_iovec(iter)) {
dio->flags |= DIO_SHOULD_DIRTY;
bio_set_pages_dirty(bio);
}
} else {
bio->bi_opf = dio_bio_write_op(iocb);
task_io_account_write(bio->bi_iter.bi_size);
}
if (iocb->ki_flags & IOCB_HIPRI) {
bio->bi_opf |= REQ_POLLED | REQ_NOWAIT;
submit_bio(bio);
WRITE_ONCE(iocb->private, bio);
} else {
if (iocb->ki_flags & IOCB_NOWAIT)
bio->bi_opf |= REQ_NOWAIT;
submit_bio(bio);
}
return -EIOCBQUEUED;
}
static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
unsigned int nr_pages;
@ -326,9 +378,11 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
return 0;
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_VECS)
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
if (likely(nr_pages <= BIO_MAX_VECS)) {
if (is_sync_kiocb(iocb))
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
return __blkdev_direct_IO_async(iocb, iter, nr_pages);
}
return __blkdev_direct_IO(iocb, iter, bio_max_segs(nr_pages));
}
@ -405,8 +459,7 @@ static loff_t blkdev_llseek(struct file *file, loff_t offset, int whence)
static int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
int datasync)
{
struct inode *bd_inode = bdev_file_inode(filp);
struct block_device *bdev = I_BDEV(bd_inode);
struct block_device *bdev = filp->private_data;
int error;
error = file_write_and_wait_range(filp, start, end);
@ -448,6 +501,8 @@ static int blkdev_open(struct inode *inode, struct file *filp)
bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
filp->private_data = bdev;
filp->f_mapping = bdev->bd_inode->i_mapping;
filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
return 0;
@ -455,29 +510,12 @@ static int blkdev_open(struct inode *inode, struct file *filp)
static int blkdev_close(struct inode *inode, struct file *filp)
{
struct block_device *bdev = I_BDEV(bdev_file_inode(filp));
struct block_device *bdev = filp->private_data;
blkdev_put(bdev, filp->f_mode);
return 0;
}
static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
{
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
fmode_t mode = file->f_mode;
/*
* O_NDELAY can be altered using fcntl(.., F_SETFL, ..), so we have
* to updated it before every ioctl.
*/
if (file->f_flags & O_NDELAY)
mode |= FMODE_NDELAY;
else
mode &= ~FMODE_NDELAY;
return blkdev_ioctl(bdev, mode, cmd, arg);
}
/*
* Write data to the block device. Only intended for the block device itself
* and the raw driver which basically is a fake block device.
@ -487,14 +525,14 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
*/
static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
struct inode *bd_inode = bdev_file_inode(file);
struct block_device *bdev = iocb->ki_filp->private_data;
struct inode *bd_inode = bdev->bd_inode;
loff_t size = i_size_read(bd_inode);
struct blk_plug plug;
size_t shorted = 0;
ssize_t ret;
if (bdev_read_only(I_BDEV(bd_inode)))
if (bdev_read_only(bdev))
return -EPERM;
if (IS_SWAPFILE(bd_inode) && !is_hibernate_resume_dev(bd_inode->i_rdev))
@ -526,24 +564,26 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
struct inode *bd_inode = bdev_file_inode(file);
loff_t size = i_size_read(bd_inode);
struct block_device *bdev = iocb->ki_filp->private_data;
loff_t size = i_size_read(bdev->bd_inode);
loff_t pos = iocb->ki_pos;
size_t shorted = 0;
ssize_t ret;
if (pos >= size)
return 0;
size -= pos;
if (iov_iter_count(to) > size) {
shorted = iov_iter_count(to) - size;
iov_iter_truncate(to, size);
if (unlikely(pos + iov_iter_count(to) > size)) {
if (pos >= size)
return 0;
size -= pos;
if (iov_iter_count(to) > size) {
shorted = iov_iter_count(to) - size;
iov_iter_truncate(to, size);
}
}
ret = generic_file_read_iter(iocb, to);
iov_iter_reexpand(to, iov_iter_count(to) + shorted);
if (unlikely(shorted))
iov_iter_reexpand(to, iov_iter_count(to) + shorted);
return ret;
}
@ -592,16 +632,18 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
switch (mode) {
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL, BLKDEV_ZERO_NOUNMAP);
error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
len >> SECTOR_SHIFT, GFP_KERNEL,
BLKDEV_ZERO_NOUNMAP);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
len >> SECTOR_SHIFT, GFP_KERNEL,
BLKDEV_ZERO_NOFALLBACK);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, 0);
error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
len >> SECTOR_SHIFT, GFP_KERNEL, 0);
break;
default:
error = -EOPNOTSUPP;
@ -618,10 +660,10 @@ const struct file_operations def_blk_fops = {
.llseek = blkdev_llseek,
.read_iter = blkdev_read_iter,
.write_iter = blkdev_write_iter,
.iopoll = blkdev_iopoll,
.iopoll = iocb_bio_iopoll,
.mmap = generic_file_mmap,
.fsync = blkdev_fsync,
.unlocked_ioctl = block_ioctl,
.unlocked_ioctl = blkdev_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = compat_blkdev_ioctl,
#endif

View File

@ -19,6 +19,7 @@
#include <linux/seq_file.h>
#include <linux/slab.h>
#include <linux/kmod.h>
#include <linux/major.h>
#include <linux/mutex.h>
#include <linux/idr.h>
#include <linux/log2.h>
@ -625,6 +626,26 @@ void del_gendisk(struct gendisk *disk)
}
EXPORT_SYMBOL(del_gendisk);
/**
* invalidate_disk - invalidate the disk
* @disk: the struct gendisk to invalidate
*
* A helper to invalidates the disk. It will clean the disk's associated
* buffer/page caches and reset its internal states so that the disk
* can be reused by the drivers.
*
* Context: can sleep
*/
void invalidate_disk(struct gendisk *disk)
{
struct block_device *bdev = disk->part0;
invalidate_bdev(bdev);
bdev->bd_inode->i_mapping->wb_err = 0;
set_capacity(disk, 0);
}
EXPORT_SYMBOL(invalidate_disk);
/* sysfs access to bad-blocks list. */
static ssize_t disk_badblocks_show(struct device *dev,
struct device_attribute *attr,
@ -884,7 +905,7 @@ ssize_t part_stat_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct block_device *bdev = dev_to_bdev(dev);
struct request_queue *q = bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bdev);
struct disk_stats stat;
unsigned int inflight;
@ -928,7 +949,7 @@ ssize_t part_inflight_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct block_device *bdev = dev_to_bdev(dev);
struct request_queue *q = bdev->bd_disk->queue;
struct request_queue *q = bdev_get_queue(bdev);
unsigned int inflight[2];
if (queue_is_mq(q))
@ -1268,6 +1289,9 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
if (!disk->bdi)
goto out_free_disk;
/* bdev_alloc() might need the queue, set before the first call */
disk->queue = q;
disk->part0 = bdev_alloc(disk, 0);
if (!disk->part0)
goto out_free_bdi;
@ -1283,7 +1307,6 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
disk_to_dev(disk)->type = &disk_type;
device_initialize(disk_to_dev(disk));
inc_diskseq(disk);
disk->queue = q;
q->disk = disk;
lockdep_init_map(&disk->lockdep_map, "(bio completion)", lkclass, 0);
#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
@ -1388,12 +1411,6 @@ void set_disk_ro(struct gendisk *disk, bool read_only)
}
EXPORT_SYMBOL(set_disk_ro);
int bdev_read_only(struct block_device *bdev)
{
return bdev->bd_read_only || get_disk_ro(bdev->bd_disk);
}
EXPORT_SYMBOL(bdev_read_only);
void inc_diskseq(struct gendisk *disk)
{
disk->diskseq = atomic64_inc_return(&diskseq);

View File

@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/genhd.h>
#include <linux/slab.h>
struct bd_holder_disk {
struct list_head list;

View File

@ -538,12 +538,22 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
*
* New commands must be compatible and go into blkdev_common_ioctl
*/
int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
unsigned long arg)
long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
{
int ret;
loff_t size;
struct block_device *bdev = I_BDEV(file->f_mapping->host);
void __user *argp = (void __user *)arg;
fmode_t mode = file->f_mode;
loff_t size;
int ret;
/*
* O_NDELAY can be altered using fcntl(.., F_SETFL, ..), so we have
* to updated it before every ioctl.
*/
if (file->f_flags & O_NDELAY)
mode |= FMODE_NDELAY;
else
mode &= ~FMODE_NDELAY;
switch (cmd) {
/* These need separate implementations for the data structure */
@ -588,7 +598,6 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
return -ENOTTY;
return bdev->bd_disk->fops->ioctl(bdev, mode, cmd, arg);
}
EXPORT_SYMBOL_GPL(blkdev_ioctl); /* for /dev/raw */
#ifdef CONFIG_COMPAT

View File

@ -1,578 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright 2019 Google LLC
*/
/**
* DOC: The Keyslot Manager
*
* Many devices with inline encryption support have a limited number of "slots"
* into which encryption contexts may be programmed, and requests can be tagged
* with a slot number to specify the key to use for en/decryption.
*
* As the number of slots is limited, and programming keys is expensive on
* many inline encryption hardware, we don't want to program the same key into
* multiple slots - if multiple requests are using the same key, we want to
* program just one slot with that key and use that slot for all requests.
*
* The keyslot manager manages these keyslots appropriately, and also acts as
* an abstraction between the inline encryption hardware and the upper layers.
*
* Lower layer devices will set up a keyslot manager in their request queue
* and tell it how to perform device specific operations like programming/
* evicting keys from keyslots.
*
* Upper layers will call blk_ksm_get_slot_for_key() to program a
* key into some slot in the inline encryption hardware.
*/
#define pr_fmt(fmt) "blk-crypto: " fmt
#include <linux/keyslot-manager.h>
#include <linux/device.h>
#include <linux/atomic.h>
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/wait.h>
#include <linux/blkdev.h>
struct blk_ksm_keyslot {
atomic_t slot_refs;
struct list_head idle_slot_node;
struct hlist_node hash_node;
const struct blk_crypto_key *key;
struct blk_keyslot_manager *ksm;
};
static inline void blk_ksm_hw_enter(struct blk_keyslot_manager *ksm)
{
/*
* Calling into the driver requires ksm->lock held and the device
* resumed. But we must resume the device first, since that can acquire
* and release ksm->lock via blk_ksm_reprogram_all_keys().
*/
if (ksm->dev)
pm_runtime_get_sync(ksm->dev);
down_write(&ksm->lock);
}
static inline void blk_ksm_hw_exit(struct blk_keyslot_manager *ksm)
{
up_write(&ksm->lock);
if (ksm->dev)
pm_runtime_put_sync(ksm->dev);
}
static inline bool blk_ksm_is_passthrough(struct blk_keyslot_manager *ksm)
{
return ksm->num_slots == 0;
}
/**
* blk_ksm_init() - Initialize a keyslot manager
* @ksm: The keyslot_manager to initialize.
* @num_slots: The number of key slots to manage.
*
* Allocate memory for keyslots and initialize a keyslot manager. Called by
* e.g. storage drivers to set up a keyslot manager in their request_queue.
*
* Return: 0 on success, or else a negative error code.
*/
int blk_ksm_init(struct blk_keyslot_manager *ksm, unsigned int num_slots)
{
unsigned int slot;
unsigned int i;
unsigned int slot_hashtable_size;
memset(ksm, 0, sizeof(*ksm));
if (num_slots == 0)
return -EINVAL;
ksm->slots = kvcalloc(num_slots, sizeof(ksm->slots[0]), GFP_KERNEL);
if (!ksm->slots)
return -ENOMEM;
ksm->num_slots = num_slots;
init_rwsem(&ksm->lock);
init_waitqueue_head(&ksm->idle_slots_wait_queue);
INIT_LIST_HEAD(&ksm->idle_slots);
for (slot = 0; slot < num_slots; slot++) {
ksm->slots[slot].ksm = ksm;
list_add_tail(&ksm->slots[slot].idle_slot_node,
&ksm->idle_slots);
}
spin_lock_init(&ksm->idle_slots_lock);
slot_hashtable_size = roundup_pow_of_two(num_slots);
/*
* hash_ptr() assumes bits != 0, so ensure the hash table has at least 2
* buckets. This only makes a difference when there is only 1 keyslot.
*/
if (slot_hashtable_size < 2)
slot_hashtable_size = 2;
ksm->log_slot_ht_size = ilog2(slot_hashtable_size);
ksm->slot_hashtable = kvmalloc_array(slot_hashtable_size,
sizeof(ksm->slot_hashtable[0]),
GFP_KERNEL);
if (!ksm->slot_hashtable)
goto err_destroy_ksm;
for (i = 0; i < slot_hashtable_size; i++)
INIT_HLIST_HEAD(&ksm->slot_hashtable[i]);
return 0;
err_destroy_ksm:
blk_ksm_destroy(ksm);
return -ENOMEM;
}
EXPORT_SYMBOL_GPL(blk_ksm_init);
static void blk_ksm_destroy_callback(void *ksm)
{
blk_ksm_destroy(ksm);
}
/**
* devm_blk_ksm_init() - Resource-managed blk_ksm_init()
* @dev: The device which owns the blk_keyslot_manager.
* @ksm: The blk_keyslot_manager to initialize.
* @num_slots: The number of key slots to manage.
*
* Like blk_ksm_init(), but causes blk_ksm_destroy() to be called automatically
* on driver detach.
*
* Return: 0 on success, or else a negative error code.
*/
int devm_blk_ksm_init(struct device *dev, struct blk_keyslot_manager *ksm,
unsigned int num_slots)
{
int err = blk_ksm_init(ksm, num_slots);
if (err)
return err;
return devm_add_action_or_reset(dev, blk_ksm_destroy_callback, ksm);
}
EXPORT_SYMBOL_GPL(devm_blk_ksm_init);
static inline struct hlist_head *
blk_ksm_hash_bucket_for_key(struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key)
{
return &ksm->slot_hashtable[hash_ptr(key, ksm->log_slot_ht_size)];
}
static void blk_ksm_remove_slot_from_lru_list(struct blk_ksm_keyslot *slot)
{
struct blk_keyslot_manager *ksm = slot->ksm;
unsigned long flags;
spin_lock_irqsave(&ksm->idle_slots_lock, flags);
list_del(&slot->idle_slot_node);
spin_unlock_irqrestore(&ksm->idle_slots_lock, flags);
}
static struct blk_ksm_keyslot *blk_ksm_find_keyslot(
struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key)
{
const struct hlist_head *head = blk_ksm_hash_bucket_for_key(ksm, key);
struct blk_ksm_keyslot *slotp;
hlist_for_each_entry(slotp, head, hash_node) {
if (slotp->key == key)
return slotp;
}
return NULL;
}
static struct blk_ksm_keyslot *blk_ksm_find_and_grab_keyslot(
struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key)
{
struct blk_ksm_keyslot *slot;
slot = blk_ksm_find_keyslot(ksm, key);
if (!slot)
return NULL;
if (atomic_inc_return(&slot->slot_refs) == 1) {
/* Took first reference to this slot; remove it from LRU list */
blk_ksm_remove_slot_from_lru_list(slot);
}
return slot;
}
unsigned int blk_ksm_get_slot_idx(struct blk_ksm_keyslot *slot)
{
return slot - slot->ksm->slots;
}
EXPORT_SYMBOL_GPL(blk_ksm_get_slot_idx);
/**
* blk_ksm_get_slot_for_key() - Program a key into a keyslot.
* @ksm: The keyslot manager to program the key into.
* @key: Pointer to the key object to program, including the raw key, crypto
* mode, and data unit size.
* @slot_ptr: A pointer to return the pointer of the allocated keyslot.
*
* Get a keyslot that's been programmed with the specified key. If one already
* exists, return it with incremented refcount. Otherwise, wait for a keyslot
* to become idle and program it.
*
* Context: Process context. Takes and releases ksm->lock.
* Return: BLK_STS_OK on success (and keyslot is set to the pointer of the
* allocated keyslot), or some other blk_status_t otherwise (and
* keyslot is set to NULL).
*/
blk_status_t blk_ksm_get_slot_for_key(struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key,
struct blk_ksm_keyslot **slot_ptr)
{
struct blk_ksm_keyslot *slot;
int slot_idx;
int err;
*slot_ptr = NULL;
if (blk_ksm_is_passthrough(ksm))
return BLK_STS_OK;
down_read(&ksm->lock);
slot = blk_ksm_find_and_grab_keyslot(ksm, key);
up_read(&ksm->lock);
if (slot)
goto success;
for (;;) {
blk_ksm_hw_enter(ksm);
slot = blk_ksm_find_and_grab_keyslot(ksm, key);
if (slot) {
blk_ksm_hw_exit(ksm);
goto success;
}
/*
* If we're here, that means there wasn't a slot that was
* already programmed with the key. So try to program it.
*/
if (!list_empty(&ksm->idle_slots))
break;
blk_ksm_hw_exit(ksm);
wait_event(ksm->idle_slots_wait_queue,
!list_empty(&ksm->idle_slots));
}
slot = list_first_entry(&ksm->idle_slots, struct blk_ksm_keyslot,
idle_slot_node);
slot_idx = blk_ksm_get_slot_idx(slot);
err = ksm->ksm_ll_ops.keyslot_program(ksm, key, slot_idx);
if (err) {
wake_up(&ksm->idle_slots_wait_queue);
blk_ksm_hw_exit(ksm);
return errno_to_blk_status(err);
}
/* Move this slot to the hash list for the new key. */
if (slot->key)
hlist_del(&slot->hash_node);
slot->key = key;
hlist_add_head(&slot->hash_node, blk_ksm_hash_bucket_for_key(ksm, key));
atomic_set(&slot->slot_refs, 1);
blk_ksm_remove_slot_from_lru_list(slot);
blk_ksm_hw_exit(ksm);
success:
*slot_ptr = slot;
return BLK_STS_OK;
}
/**
* blk_ksm_put_slot() - Release a reference to a slot
* @slot: The keyslot to release the reference of.
*
* Context: Any context.
*/
void blk_ksm_put_slot(struct blk_ksm_keyslot *slot)
{
struct blk_keyslot_manager *ksm;
unsigned long flags;
if (!slot)
return;
ksm = slot->ksm;
if (atomic_dec_and_lock_irqsave(&slot->slot_refs,
&ksm->idle_slots_lock, flags)) {
list_add_tail(&slot->idle_slot_node, &ksm->idle_slots);
spin_unlock_irqrestore(&ksm->idle_slots_lock, flags);
wake_up(&ksm->idle_slots_wait_queue);
}
}
/**
* blk_ksm_crypto_cfg_supported() - Find out if a crypto configuration is
* supported by a ksm.
* @ksm: The keyslot manager to check
* @cfg: The crypto configuration to check for.
*
* Checks for crypto_mode/data unit size/dun bytes support.
*
* Return: Whether or not this ksm supports the specified crypto config.
*/
bool blk_ksm_crypto_cfg_supported(struct blk_keyslot_manager *ksm,
const struct blk_crypto_config *cfg)
{
if (!ksm)
return false;
if (!(ksm->crypto_modes_supported[cfg->crypto_mode] &
cfg->data_unit_size))
return false;
if (ksm->max_dun_bytes_supported < cfg->dun_bytes)
return false;
return true;
}
/**
* blk_ksm_evict_key() - Evict a key from the lower layer device.
* @ksm: The keyslot manager to evict from
* @key: The key to evict
*
* Find the keyslot that the specified key was programmed into, and evict that
* slot from the lower layer device. The slot must not be in use by any
* in-flight IO when this function is called.
*
* Context: Process context. Takes and releases ksm->lock.
* Return: 0 on success or if there's no keyslot with the specified key, -EBUSY
* if the keyslot is still in use, or another -errno value on other
* error.
*/
int blk_ksm_evict_key(struct blk_keyslot_manager *ksm,
const struct blk_crypto_key *key)
{
struct blk_ksm_keyslot *slot;
int err = 0;
if (blk_ksm_is_passthrough(ksm)) {
if (ksm->ksm_ll_ops.keyslot_evict) {
blk_ksm_hw_enter(ksm);
err = ksm->ksm_ll_ops.keyslot_evict(ksm, key, -1);
blk_ksm_hw_exit(ksm);
return err;
}
return 0;
}
blk_ksm_hw_enter(ksm);
slot = blk_ksm_find_keyslot(ksm, key);
if (!slot)
goto out_unlock;
if (WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)) {
err = -EBUSY;
goto out_unlock;
}
err = ksm->ksm_ll_ops.keyslot_evict(ksm, key,
blk_ksm_get_slot_idx(slot));
if (err)
goto out_unlock;
hlist_del(&slot->hash_node);
slot->key = NULL;
err = 0;
out_unlock:
blk_ksm_hw_exit(ksm);
return err;
}
/**
* blk_ksm_reprogram_all_keys() - Re-program all keyslots.
* @ksm: The keyslot manager
*
* Re-program all keyslots that are supposed to have a key programmed. This is
* intended only for use by drivers for hardware that loses its keys on reset.
*
* Context: Process context. Takes and releases ksm->lock.
*/
void blk_ksm_reprogram_all_keys(struct blk_keyslot_manager *ksm)
{
unsigned int slot;
if (blk_ksm_is_passthrough(ksm))
return;
/* This is for device initialization, so don't resume the device */
down_write(&ksm->lock);
for (slot = 0; slot < ksm->num_slots; slot++) {
const struct blk_crypto_key *key = ksm->slots[slot].key;
int err;
if (!key)
continue;
err = ksm->ksm_ll_ops.keyslot_program(ksm, key, slot);
WARN_ON(err);
}
up_write(&ksm->lock);
}
EXPORT_SYMBOL_GPL(blk_ksm_reprogram_all_keys);
void blk_ksm_destroy(struct blk_keyslot_manager *ksm)
{
if (!ksm)
return;
kvfree(ksm->slot_hashtable);
kvfree_sensitive(ksm->slots, sizeof(ksm->slots[0]) * ksm->num_slots);
memzero_explicit(ksm, sizeof(*ksm));
}
EXPORT_SYMBOL_GPL(blk_ksm_destroy);
bool blk_ksm_register(struct blk_keyslot_manager *ksm, struct request_queue *q)
{
if (blk_integrity_queue_supports_integrity(q)) {
pr_warn("Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
return false;
}
q->ksm = ksm;
return true;
}
EXPORT_SYMBOL_GPL(blk_ksm_register);
void blk_ksm_unregister(struct request_queue *q)
{
q->ksm = NULL;
}
/**
* blk_ksm_intersect_modes() - restrict supported modes by child device
* @parent: The keyslot manager for parent device
* @child: The keyslot manager for child device, or NULL
*
* Clear any crypto mode support bits in @parent that aren't set in @child.
* If @child is NULL, then all parent bits are cleared.
*
* Only use this when setting up the keyslot manager for a layered device,
* before it's been exposed yet.
*/
void blk_ksm_intersect_modes(struct blk_keyslot_manager *parent,
const struct blk_keyslot_manager *child)
{
if (child) {
unsigned int i;
parent->max_dun_bytes_supported =
min(parent->max_dun_bytes_supported,
child->max_dun_bytes_supported);
for (i = 0; i < ARRAY_SIZE(child->crypto_modes_supported);
i++) {
parent->crypto_modes_supported[i] &=
child->crypto_modes_supported[i];
}
} else {
parent->max_dun_bytes_supported = 0;
memset(parent->crypto_modes_supported, 0,
sizeof(parent->crypto_modes_supported));
}
}
EXPORT_SYMBOL_GPL(blk_ksm_intersect_modes);
/**
* blk_ksm_is_superset() - Check if a KSM supports a superset of crypto modes
* and DUN bytes that another KSM supports. Here,
* "superset" refers to the mathematical meaning of the
* word - i.e. if two KSMs have the *same* capabilities,
* they *are* considered supersets of each other.
* @ksm_superset: The KSM that we want to verify is a superset
* @ksm_subset: The KSM that we want to verify is a subset
*
* Return: True if @ksm_superset supports a superset of the crypto modes and DUN
* bytes that @ksm_subset supports.
*/
bool blk_ksm_is_superset(struct blk_keyslot_manager *ksm_superset,
struct blk_keyslot_manager *ksm_subset)
{
int i;
if (!ksm_subset)
return true;
if (!ksm_superset)
return false;
for (i = 0; i < ARRAY_SIZE(ksm_superset->crypto_modes_supported); i++) {
if (ksm_subset->crypto_modes_supported[i] &
(~ksm_superset->crypto_modes_supported[i])) {
return false;
}
}
if (ksm_subset->max_dun_bytes_supported >
ksm_superset->max_dun_bytes_supported) {
return false;
}
return true;
}
EXPORT_SYMBOL_GPL(blk_ksm_is_superset);
/**
* blk_ksm_update_capabilities() - Update the restrictions of a KSM to those of
* another KSM
* @target_ksm: The KSM whose restrictions to update.
* @reference_ksm: The KSM to whose restrictions this function will update
* @target_ksm's restrictions to.
*
* Blk-crypto requires that crypto capabilities that were
* advertised when a bio was created continue to be supported by the
* device until that bio is ended. This is turn means that a device cannot
* shrink its advertised crypto capabilities without any explicit
* synchronization with upper layers. So if there's no such explicit
* synchronization, @reference_ksm must support all the crypto capabilities that
* @target_ksm does
* (i.e. we need blk_ksm_is_superset(@reference_ksm, @target_ksm) == true).
*
* Note also that as long as the crypto capabilities are being expanded, the
* order of updates becoming visible is not important because it's alright
* for blk-crypto to see stale values - they only cause blk-crypto to
* believe that a crypto capability isn't supported when it actually is (which
* might result in blk-crypto-fallback being used if available, or the bio being
* failed).
*/
void blk_ksm_update_capabilities(struct blk_keyslot_manager *target_ksm,
struct blk_keyslot_manager *reference_ksm)
{
memcpy(target_ksm->crypto_modes_supported,
reference_ksm->crypto_modes_supported,
sizeof(target_ksm->crypto_modes_supported));
target_ksm->max_dun_bytes_supported =
reference_ksm->max_dun_bytes_supported;
}
EXPORT_SYMBOL_GPL(blk_ksm_update_capabilities);
/**
* blk_ksm_init_passthrough() - Init a passthrough keyslot manager
* @ksm: The keyslot manager to init
*
* Initialize a passthrough keyslot manager.
* Called by e.g. storage drivers to set up a keyslot manager in their
* request_queue, when the storage driver wants to manage its keys by itself.
* This is useful for inline encryption hardware that doesn't have the concept
* of keyslots, and for layered devices.
*/
void blk_ksm_init_passthrough(struct blk_keyslot_manager *ksm)
{
memset(ksm, 0, sizeof(*ksm));
init_rwsem(&ksm->lock);
}
EXPORT_SYMBOL_GPL(blk_ksm_init_passthrough);

View File

@ -9,12 +9,12 @@
#include <linux/kernel.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/elevator.h>
#include <linux/module.h>
#include <linux/sbitmap.h>
#include <trace/events/block.h>
#include "elevator.h"
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-debugfs.h"
@ -453,11 +453,11 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
{
struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
struct blk_mq_tags *tags = hctx->sched_tags;
unsigned int shift = tags->bitmap_tags->sb.shift;
unsigned int shift = tags->bitmap_tags.sb.shift;
kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, kqd->async_depth);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
}
static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)

View File

@ -9,7 +9,6 @@
#include <linux/fs.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/elevator.h>
#include <linux/bio.h>
#include <linux/module.h>
#include <linux/slab.h>
@ -20,6 +19,7 @@
#include <trace/events/block.h>
#include "elevator.h"
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-debugfs.h"
@ -31,6 +31,11 @@
*/
static const int read_expire = HZ / 2; /* max time before a read is submitted. */
static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
/*
* Time after which to dispatch lower priority requests even if higher
* priority requests are pending.
*/
static const int prio_aging_expire = 10 * HZ;
static const int writes_starved = 2; /* max times reads can starve a write */
static const int fifo_batch = 16; /* # of sequential requests treated as one
by the above parameters. For throughput. */
@ -51,17 +56,16 @@ enum dd_prio {
enum { DD_PRIO_COUNT = 3 };
/* I/O statistics per I/O priority. */
/*
* I/O statistics per I/O priority. It is fine if these counters overflow.
* What matters is that these counters are at least as wide as
* log2(max_outstanding_requests).
*/
struct io_stats_per_prio {
local_t inserted;
local_t merged;
local_t dispatched;
local_t completed;
};
/* I/O statistics for all I/O priorities (enum dd_prio). */
struct io_stats {
struct io_stats_per_prio stats[DD_PRIO_COUNT];
uint32_t inserted;
uint32_t merged;
uint32_t dispatched;
atomic_t completed;
};
/*
@ -74,6 +78,7 @@ struct dd_per_prio {
struct list_head fifo_list[DD_DIR_COUNT];
/* Next request in FIFO order. Read, write or both are NULL. */
struct request *next_rq[DD_DIR_COUNT];
struct io_stats_per_prio stats;
};
struct deadline_data {
@ -88,8 +93,6 @@ struct deadline_data {
unsigned int batching; /* number of sequential requests made */
unsigned int starved; /* times reads have starved writes */
struct io_stats __percpu *stats;
/*
* settings that change how the i/o scheduler behaves
*/
@ -98,38 +101,12 @@ struct deadline_data {
int writes_starved;
int front_merges;
u32 async_depth;
int prio_aging_expire;
spinlock_t lock;
spinlock_t zone_lock;
};
/* Count one event of type 'event_type' and with I/O priority 'prio' */
#define dd_count(dd, event_type, prio) do { \
struct io_stats *io_stats = get_cpu_ptr((dd)->stats); \
\
BUILD_BUG_ON(!__same_type((dd), struct deadline_data *)); \
BUILD_BUG_ON(!__same_type((prio), enum dd_prio)); \
local_inc(&io_stats->stats[(prio)].event_type); \
put_cpu_ptr(io_stats); \
} while (0)
/*
* Returns the total number of dd_count(dd, event_type, prio) calls across all
* CPUs. No locking or barriers since it is fine if the returned sum is slightly
* outdated.
*/
#define dd_sum(dd, event_type, prio) ({ \
unsigned int cpu; \
u32 sum = 0; \
\
BUILD_BUG_ON(!__same_type((dd), struct deadline_data *)); \
BUILD_BUG_ON(!__same_type((prio), enum dd_prio)); \
for_each_present_cpu(cpu) \
sum += local_read(&per_cpu_ptr((dd)->stats, cpu)-> \
stats[(prio)].event_type); \
sum; \
})
/* Maps an I/O priority class to a deadline scheduler priority. */
static const enum dd_prio ioprio_class_to_prio[] = {
[IOPRIO_CLASS_NONE] = DD_BE_PRIO,
@ -233,7 +210,9 @@ static void dd_merged_requests(struct request_queue *q, struct request *req,
const u8 ioprio_class = dd_rq_ioclass(next);
const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
dd_count(dd, merged, prio);
lockdep_assert_held(&dd->lock);
dd->per_prio[prio].stats.merged++;
/*
* if next expires before rq, assign its expire time to rq
@ -270,6 +249,16 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
deadline_remove_request(rq->q, per_prio, rq);
}
/* Number of requests queued for a given priority level. */
static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
{
const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
lockdep_assert_held(&dd->lock);
return stats->inserted - atomic_read(&stats->completed);
}
/*
* deadline_check_fifo returns 0 if there are no expired requests on the fifo,
* 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
@ -355,12 +344,27 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
return rq;
}
/*
* Returns true if and only if @rq started after @latest_start where
* @latest_start is in jiffies.
*/
static bool started_after(struct deadline_data *dd, struct request *rq,
unsigned long latest_start)
{
unsigned long start_time = (unsigned long)rq->fifo_time;
start_time -= dd->fifo_expire[rq_data_dir(rq)];
return time_after(start_time, latest_start);
}
/*
* deadline_dispatch_requests selects the best request according to
* read/write expire, fifo_batch, etc
* read/write expire, fifo_batch, etc and with a start time <= @latest_start.
*/
static struct request *__dd_dispatch_request(struct deadline_data *dd,
struct dd_per_prio *per_prio)
struct dd_per_prio *per_prio,
unsigned long latest_start)
{
struct request *rq, *next_rq;
enum dd_data_dir data_dir;
@ -372,6 +376,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
if (!list_empty(&per_prio->dispatch)) {
rq = list_first_entry(&per_prio->dispatch, struct request,
queuelist);
if (started_after(dd, rq, latest_start))
return NULL;
list_del_init(&rq->queuelist);
goto done;
}
@ -449,6 +455,9 @@ dispatch_find_request:
dd->batching = 0;
dispatch_request:
if (started_after(dd, rq, latest_start))
return NULL;
/*
* rq is the selected appropriate request.
*/
@ -457,7 +466,7 @@ dispatch_request:
done:
ioprio_class = dd_rq_ioclass(rq);
prio = ioprio_class_to_prio[ioprio_class];
dd_count(dd, dispatched, prio);
dd->per_prio[prio].stats.dispatched++;
/*
* If the request needs its target zone locked, do it.
*/
@ -466,6 +475,34 @@ done:
return rq;
}
/*
* Check whether there are any requests with priority other than DD_RT_PRIO
* that were inserted more than prio_aging_expire jiffies ago.
*/
static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
unsigned long now)
{
struct request *rq;
enum dd_prio prio;
int prio_cnt;
lockdep_assert_held(&dd->lock);
prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
!!dd_queued(dd, DD_IDLE_PRIO);
if (prio_cnt < 2)
return NULL;
for (prio = DD_BE_PRIO; prio <= DD_PRIO_MAX; prio++) {
rq = __dd_dispatch_request(dd, &dd->per_prio[prio],
now - dd->prio_aging_expire);
if (rq)
return rq;
}
return NULL;
}
/*
* Called from blk_mq_run_hw_queue() -> __blk_mq_sched_dispatch_requests().
*
@ -477,15 +514,26 @@ done:
static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
struct deadline_data *dd = hctx->queue->elevator->elevator_data;
const unsigned long now = jiffies;
struct request *rq;
enum dd_prio prio;
spin_lock(&dd->lock);
rq = dd_dispatch_prio_aged_requests(dd, now);
if (rq)
goto unlock;
/*
* Next, dispatch requests in priority order. Ignore lower priority
* requests if any higher priority requests are pending.
*/
for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
rq = __dd_dispatch_request(dd, &dd->per_prio[prio]);
if (rq)
rq = __dd_dispatch_request(dd, &dd->per_prio[prio], now);
if (rq || dd_queued(dd, prio))
break;
}
unlock:
spin_unlock(&dd->lock);
return rq;
@ -519,7 +567,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, dd->async_depth);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
}
/* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
@ -536,12 +584,21 @@ static void dd_exit_sched(struct elevator_queue *e)
for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
struct dd_per_prio *per_prio = &dd->per_prio[prio];
const struct io_stats_per_prio *stats = &per_prio->stats;
uint32_t queued;
WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_READ]));
WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_WRITE]));
}
free_percpu(dd->stats);
spin_lock(&dd->lock);
queued = dd_queued(dd, prio);
spin_unlock(&dd->lock);
WARN_ONCE(queued != 0,
"statistics for priority %d: i %u m %u d %u c %u\n",
prio, stats->inserted, stats->merged,
stats->dispatched, atomic_read(&stats->completed));
}
kfree(dd);
}
@ -566,11 +623,6 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
eq->elevator_data = dd;
dd->stats = alloc_percpu_gfp(typeof(*dd->stats),
GFP_KERNEL | __GFP_ZERO);
if (!dd->stats)
goto free_dd;
for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
struct dd_per_prio *per_prio = &dd->per_prio[prio];
@ -586,15 +638,13 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
dd->front_merges = 1;
dd->last_dir = DD_WRITE;
dd->fifo_batch = fifo_batch;
dd->prio_aging_expire = prio_aging_expire;
spin_lock_init(&dd->lock);
spin_lock_init(&dd->zone_lock);
q->elevator = eq;
return 0;
free_dd:
kfree(dd);
put_eq:
kobject_put(&eq->kobj);
return ret;
@ -677,8 +727,11 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
blk_req_zone_write_unlock(rq);
prio = ioprio_class_to_prio[ioprio_class];
dd_count(dd, inserted, prio);
rq->elv.priv[0] = (void *)(uintptr_t)1;
per_prio = &dd->per_prio[prio];
if (!rq->elv.priv[0]) {
per_prio->stats.inserted++;
rq->elv.priv[0] = (void *)(uintptr_t)1;
}
if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
blk_mq_free_requests(&free);
@ -687,7 +740,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
trace_block_rq_insert(rq);
per_prio = &dd->per_prio[prio];
if (at_head) {
list_add(&rq->queuelist, &per_prio->dispatch);
} else {
@ -759,12 +811,13 @@ static void dd_finish_request(struct request *rq)
/*
* The block layer core may call dd_finish_request() without having
* called dd_insert_requests(). Hence only update statistics for
* requests for which dd_insert_requests() has been called. See also
* blk_mq_request_bypass_insert().
* called dd_insert_requests(). Skip requests that bypassed I/O
* scheduling. See also blk_mq_request_bypass_insert().
*/
if (rq->elv.priv[0])
dd_count(dd, completed, prio);
if (!rq->elv.priv[0])
return;
atomic_inc(&per_prio->stats.completed);
if (blk_queue_is_zoned(q)) {
unsigned long flags;
@ -809,6 +862,7 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page) \
#define SHOW_JIFFIES(__FUNC, __VAR) SHOW_INT(__FUNC, jiffies_to_msecs(__VAR))
SHOW_JIFFIES(deadline_read_expire_show, dd->fifo_expire[DD_READ]);
SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
SHOW_INT(deadline_front_merges_show, dd->front_merges);
SHOW_INT(deadline_async_depth_show, dd->front_merges);
@ -838,6 +892,7 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, msecs_to_jiffies)
STORE_JIFFIES(deadline_read_expire_store, &dd->fifo_expire[DD_READ], 0, INT_MAX);
STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MAX);
STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
@ -856,6 +911,7 @@ static struct elv_fs_entry deadline_attrs[] = {
DD_ATTR(front_merges),
DD_ATTR(async_depth),
DD_ATTR(fifo_batch),
DD_ATTR(prio_aging_expire),
__ATTR_NULL
};
@ -947,38 +1003,48 @@ static int dd_async_depth_show(void *data, struct seq_file *m)
return 0;
}
/* Number of requests queued for a given priority level. */
static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
{
return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
}
static int dd_queued_show(void *data, struct seq_file *m)
{
struct request_queue *q = data;
struct deadline_data *dd = q->elevator->elevator_data;
u32 rt, be, idle;
spin_lock(&dd->lock);
rt = dd_queued(dd, DD_RT_PRIO);
be = dd_queued(dd, DD_BE_PRIO);
idle = dd_queued(dd, DD_IDLE_PRIO);
spin_unlock(&dd->lock);
seq_printf(m, "%u %u %u\n", rt, be, idle);
seq_printf(m, "%u %u %u\n", dd_queued(dd, DD_RT_PRIO),
dd_queued(dd, DD_BE_PRIO),
dd_queued(dd, DD_IDLE_PRIO));
return 0;
}
/* Number of requests owned by the block driver for a given priority. */
static u32 dd_owned_by_driver(struct deadline_data *dd, enum dd_prio prio)
{
return dd_sum(dd, dispatched, prio) + dd_sum(dd, merged, prio)
- dd_sum(dd, completed, prio);
const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
lockdep_assert_held(&dd->lock);
return stats->dispatched + stats->merged -
atomic_read(&stats->completed);
}
static int dd_owned_by_driver_show(void *data, struct seq_file *m)
{
struct request_queue *q = data;
struct deadline_data *dd = q->elevator->elevator_data;
u32 rt, be, idle;
spin_lock(&dd->lock);
rt = dd_owned_by_driver(dd, DD_RT_PRIO);
be = dd_owned_by_driver(dd, DD_BE_PRIO);
idle = dd_owned_by_driver(dd, DD_IDLE_PRIO);
spin_unlock(&dd->lock);
seq_printf(m, "%u %u %u\n", rt, be, idle);
seq_printf(m, "%u %u %u\n", dd_owned_by_driver(dd, DD_RT_PRIO),
dd_owned_by_driver(dd, DD_BE_PRIO),
dd_owned_by_driver(dd, DD_IDLE_PRIO));
return 0;
}

View File

@ -2,6 +2,8 @@
#
# Partition configuration
#
menu "Partition Types"
config PARTITION_ADVANCED
bool "Advanced partition selection"
help
@ -267,3 +269,5 @@ config CMDLINE_PARTITION
help
Say Y here if you want to read the partition table from bootargs.
The format for the command line is just like mtdparts.
endmenu

View File

@ -5,6 +5,7 @@
* Copyright (C) 2020 Christoph Hellwig
*/
#include <linux/fs.h>
#include <linux/major.h>
#include <linux/slab.h>
#include <linux/ctype.h>
#include <linux/genhd.h>
@ -203,7 +204,7 @@ static ssize_t part_alignment_offset_show(struct device *dev,
struct block_device *bdev = dev_to_bdev(dev);
return sprintf(buf, "%u\n",
queue_limit_alignment_offset(&bdev->bd_disk->queue->limits,
queue_limit_alignment_offset(&bdev_get_queue(bdev)->limits,
bdev->bd_start_sect));
}
@ -213,7 +214,7 @@ static ssize_t part_discard_alignment_show(struct device *dev,
struct block_device *bdev = dev_to_bdev(dev);
return sprintf(buf, "%u\n",
queue_limit_discard_alignment(&bdev->bd_disk->queue->limits,
queue_limit_discard_alignment(&bdev_get_queue(bdev)->limits,
bdev->bd_start_sect));
}

View File

@ -5,7 +5,7 @@
*/
#include <linux/t10-pi.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/crc-t10dif.h>
#include <linux/module.h>
#include <net/checksum.h>

View File

@ -61,10 +61,10 @@
#include <linux/hdreg.h>
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/major.h>
#include <linux/mutex.h>
#include <linux/fs.h>
#include <linux/blk-mq.h>
#include <linux/elevator.h>
#include <linux/interrupt.h>
#include <linux/platform_device.h>

View File

@ -68,6 +68,7 @@
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/blk-mq.h>
#include <linux/major.h>
#include <linux/mutex.h>
#include <linux/completion.h>
#include <linux/wait.h>

View File

@ -282,7 +282,7 @@ out:
return err;
}
static blk_qc_t brd_submit_bio(struct bio *bio)
static void brd_submit_bio(struct bio *bio)
{
struct brd_device *brd = bio->bi_bdev->bd_disk->private_data;
sector_t sector = bio->bi_iter.bi_sector;
@ -299,16 +299,14 @@ static blk_qc_t brd_submit_bio(struct bio *bio)
err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
bio_op(bio), sector);
if (err)
goto io_error;
if (err) {
bio_io_error(bio);
return;
}
sector += len >> SECTOR_SHIFT;
}
bio_endio(bio);
return BLK_QC_T_NONE;
io_error:
bio_io_error(bio);
return BLK_QC_T_NONE;
}
static int brd_rw_page(struct block_device *bdev, sector_t sector,

View File

@ -1448,7 +1448,7 @@ extern void conn_free_crypto(struct drbd_connection *connection);
/* drbd_req */
extern void do_submit(struct work_struct *ws);
extern void __drbd_make_request(struct drbd_device *, struct bio *);
extern blk_qc_t drbd_submit_bio(struct bio *bio);
void drbd_submit_bio(struct bio *bio);
extern int drbd_read_remote(struct drbd_device *device, struct drbd_request *req);
extern int is_valid_ar_handle(struct drbd_request *, sector_t);

View File

@ -1596,7 +1596,7 @@ void do_submit(struct work_struct *ws)
}
}
blk_qc_t drbd_submit_bio(struct bio *bio)
void drbd_submit_bio(struct bio *bio)
{
struct drbd_device *device = bio->bi_bdev->bd_disk->private_data;
@ -1609,7 +1609,6 @@ blk_qc_t drbd_submit_bio(struct bio *bio)
inc_ap_bio(device);
__drbd_make_request(device, bio);
return BLK_QC_T_NONE;
}
static bool net_timeout_reached(struct drbd_request *net_req,

View File

@ -184,6 +184,7 @@ static int print_unex = 1;
#include <linux/ioport.h>
#include <linux/interrupt.h>
#include <linux/init.h>
#include <linux/major.h>
#include <linux/platform_device.h>
#include <linux/mod_devicetable.h>
#include <linux/mutex.h>

View File

@ -272,19 +272,6 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
blk_mq_unfreeze_queue(lo->lo_queue);
}
/**
* loop_validate_block_size() - validates the passed in block size
* @bsize: size to validate
*/
static int
loop_validate_block_size(unsigned short bsize)
{
if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
return -EINVAL;
return 0;
}
/**
* loop_set_size() - sets device size and notifies userspace
* @lo: struct loop_device to set the size for
@ -1236,7 +1223,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
}
if (config->block_size) {
error = loop_validate_block_size(config->block_size);
error = blk_validate_block_size(config->block_size);
if (error)
goto out_unlock;
}
@ -1329,7 +1316,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
{
struct file *filp = NULL;
gfp_t gfp = lo->old_gfp_mask;
struct block_device *bdev = lo->lo_device;
int err = 0;
bool partscan = false;
int lo_number;
@ -1395,22 +1381,16 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
blk_queue_logical_block_size(lo->lo_queue, 512);
blk_queue_physical_block_size(lo->lo_queue, 512);
blk_queue_io_min(lo->lo_queue, 512);
if (bdev) {
invalidate_bdev(bdev);
bdev->bd_inode->i_mapping->wb_err = 0;
}
set_capacity(lo->lo_disk, 0);
invalidate_disk(lo->lo_disk);
loop_sysfs_exit(lo);
if (bdev) {
/* let user-space know about this change */
kobject_uevent(&disk_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
}
/* let user-space know about this change */
kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
mapping_set_gfp_mask(filp->f_mapping, gfp);
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
lo_number = lo->lo_number;
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
out_unlock:
@ -1759,7 +1739,7 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
if (lo->lo_state != Lo_bound)
return -ENXIO;
err = loop_validate_block_size(arg);
err = blk_validate_block_size(arg);
if (err)
return err;

View File

@ -84,7 +84,7 @@ static bool n64cart_do_bvec(struct device *dev, struct bio_vec *bv, u32 pos)
return true;
}
static blk_qc_t n64cart_submit_bio(struct bio *bio)
static void n64cart_submit_bio(struct bio *bio)
{
struct bio_vec bvec;
struct bvec_iter iter;
@ -92,16 +92,14 @@ static blk_qc_t n64cart_submit_bio(struct bio *bio)
u32 pos = bio->bi_iter.bi_sector << SECTOR_SHIFT;
bio_for_each_segment(bvec, bio, iter) {
if (!n64cart_do_bvec(dev, &bvec, pos))
goto io_error;
if (!n64cart_do_bvec(dev, &bvec, pos)) {
bio_io_error(bio);
return;
}
pos += bvec.bv_len;
}
bio_endio(bio);
return BLK_QC_T_NONE;
io_error:
bio_io_error(bio);
return BLK_QC_T_NONE;
}
static const struct block_device_operations n64cart_fops = {

View File

@ -310,20 +310,13 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
nsock->sent = 0;
}
static void nbd_size_clear(struct nbd_device *nbd)
{
if (nbd->config->bytesize) {
set_capacity(nbd->disk, 0);
kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
}
}
static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
loff_t blksize)
{
if (!blksize)
blksize = 1u << NBD_DEF_BLKSIZE_BITS;
if (blksize < 512 || blksize > PAGE_SIZE || !is_power_of_2(blksize))
if (blk_validate_block_size(blksize))
return -EINVAL;
nbd->config->bytesize = bytesize;
@ -1237,7 +1230,9 @@ static void nbd_config_put(struct nbd_device *nbd)
&nbd->config_lock)) {
struct nbd_config *config = nbd->config;
nbd_dev_dbg_close(nbd);
nbd_size_clear(nbd);
invalidate_disk(nbd->disk);
if (nbd->config->bytesize)
kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
if (test_and_clear_bit(NBD_RT_HAS_PID_FILE,
&config->runtime_flags))
device_remove_file(disk_to_dev(nbd->disk), &pid_attr);

View File

@ -1422,7 +1422,7 @@ static struct nullb_queue *nullb_to_queue(struct nullb *nullb)
return &nullb->queues[index];
}
static blk_qc_t null_submit_bio(struct bio *bio)
static void null_submit_bio(struct bio *bio)
{
sector_t sector = bio->bi_iter.bi_sector;
sector_t nr_sectors = bio_sectors(bio);
@ -1434,7 +1434,6 @@ static blk_qc_t null_submit_bio(struct bio *bio)
cmd->bio = bio;
null_handle_cmd(cmd, sector, nr_sectors, bio_op(bio));
return BLK_QC_T_NONE;
}
static bool should_timeout_request(struct request *rq)

View File

@ -2400,7 +2400,7 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
}
}
static blk_qc_t pkt_submit_bio(struct bio *bio)
static void pkt_submit_bio(struct bio *bio)
{
struct pktcdvd_device *pd;
char b[BDEVNAME_SIZE];
@ -2423,7 +2423,7 @@ static blk_qc_t pkt_submit_bio(struct bio *bio)
*/
if (bio_data_dir(bio) == READ) {
pkt_make_request_read(pd, bio);
return BLK_QC_T_NONE;
return;
}
if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
@ -2455,10 +2455,9 @@ static blk_qc_t pkt_submit_bio(struct bio *bio)
pkt_make_request_write(bio->bi_bdev->bd_disk->queue, split);
} while (split != bio);
return BLK_QC_T_NONE;
return;
end_io:
bio_io_error(bio);
return BLK_QC_T_NONE;
}
static void pkt_init_queue(struct pktcdvd_device *pd)

View File

@ -578,7 +578,7 @@ out:
return next;
}
static blk_qc_t ps3vram_submit_bio(struct bio *bio)
static void ps3vram_submit_bio(struct bio *bio)
{
struct ps3_system_bus_device *dev = bio->bi_bdev->bd_disk->private_data;
struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
@ -594,13 +594,11 @@ static blk_qc_t ps3vram_submit_bio(struct bio *bio)
spin_unlock_irq(&priv->lock);
if (busy)
return BLK_QC_T_NONE;
return;
do {
bio = ps3vram_do_bio(dev, bio);
} while (bio);
return BLK_QC_T_NONE;
}
static const struct block_device_operations ps3vram_fops = {

View File

@ -836,7 +836,7 @@ struct rbd_options {
u32 alloc_hint_flags; /* CEPH_OSD_OP_ALLOC_HINT_FLAG_* */
};
#define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ
#define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_DEFAULT_RQ
#define RBD_ALLOC_SIZE_DEFAULT (64 * 1024)
#define RBD_LOCK_TIMEOUT_DEFAULT 0 /* no timeout */
#define RBD_READ_ONLY_DEFAULT false

View File

@ -1176,7 +1176,7 @@ static blk_status_t rnbd_queue_rq(struct blk_mq_hw_ctx *hctx,
return ret;
}
static int rnbd_rdma_poll(struct blk_mq_hw_ctx *hctx)
static int rnbd_rdma_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
{
struct rnbd_queue *q = hctx->driver_data;
struct rnbd_clt_dev *dev = q->dev;

View File

@ -10,7 +10,7 @@
#define RNBD_PROTO_H
#include <linux/types.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/limits.h>
#include <linux/inet.h>
#include <linux/in.h>

View File

@ -50,7 +50,7 @@ struct rsxx_bio_meta {
static struct kmem_cache *bio_meta_pool;
static blk_qc_t rsxx_submit_bio(struct bio *bio);
static void rsxx_submit_bio(struct bio *bio);
/*----------------- Block Device Operations -----------------*/
static int rsxx_blkdev_ioctl(struct block_device *bdev,
@ -120,7 +120,7 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
}
}
static blk_qc_t rsxx_submit_bio(struct bio *bio)
static void rsxx_submit_bio(struct bio *bio)
{
struct rsxx_cardinfo *card = bio->bi_bdev->bd_disk->private_data;
struct rsxx_bio_meta *bio_meta;
@ -169,7 +169,7 @@ static blk_qc_t rsxx_submit_bio(struct bio *bio)
if (st)
goto queue_err;
return BLK_QC_T_NONE;
return;
queue_err:
kmem_cache_free(bio_meta_pool, bio_meta);
@ -177,7 +177,6 @@ req_err:
if (st)
bio->bi_status = st;
bio_endio(bio);
return BLK_QC_T_NONE;
}
/*----------------- Device Setup -------------------*/

View File

@ -16,6 +16,7 @@
#include <linux/fd.h>
#include <linux/slab.h>
#include <linux/blk-mq.h>
#include <linux/major.h>
#include <linux/mutex.h>
#include <linux/hdreg.h>
#include <linux/kernel.h>

View File

@ -815,9 +815,17 @@ static int virtblk_probe(struct virtio_device *vdev)
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
struct virtio_blk_config, blk_size,
&blk_size);
if (!err)
if (!err) {
err = blk_validate_block_size(blk_size);
if (err) {
dev_err(&vdev->dev,
"virtio_blk: invalid block size: 0x%x\n",
blk_size);
goto out_cleanup_disk;
}
blk_queue_logical_block_size(q, blk_size);
else
} else
blk_size = queue_logical_block_size(q);
/* Use topology information if available */

View File

@ -42,6 +42,7 @@
#include <linux/cdrom.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/major.h>
#include <linux/mutex.h>
#include <linux/scatterlist.h>
#include <linux/bitmap.h>

View File

@ -1598,22 +1598,18 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
/*
* Handler function for all zram I/O requests.
*/
static blk_qc_t zram_submit_bio(struct bio *bio)
static void zram_submit_bio(struct bio *bio)
{
struct zram *zram = bio->bi_bdev->bd_disk->private_data;
if (!valid_io_request(zram, bio->bi_iter.bi_sector,
bio->bi_iter.bi_size)) {
atomic64_inc(&zram->stats.invalid_io);
goto error;
bio_io_error(bio);
return;
}
__zram_make_request(zram, bio);
return BLK_QC_T_NONE;
error:
bio_io_error(bio);
return BLK_QC_T_NONE;
}
static void zram_slot_free_notify(struct block_device *bdev,

View File

@ -30,6 +30,7 @@
#include <linux/sched.h>
#include <linux/types.h>
#include <linux/workqueue.h>
#include <linux/sched/clock.h>
struct drm_i915_private;
struct timer_list;

View File

@ -1163,7 +1163,7 @@ static void quit_max_writeback_rate(struct cache_set *c,
/* Cached devices - read & write stuff */
blk_qc_t cached_dev_submit_bio(struct bio *bio)
void cached_dev_submit_bio(struct bio *bio)
{
struct search *s;
struct block_device *orig_bdev = bio->bi_bdev;
@ -1176,7 +1176,7 @@ blk_qc_t cached_dev_submit_bio(struct bio *bio)
dc->io_disable)) {
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
return BLK_QC_T_NONE;
return;
}
if (likely(d->c)) {
@ -1222,8 +1222,6 @@ blk_qc_t cached_dev_submit_bio(struct bio *bio)
} else
/* I/O request sent to backing device */
detached_dev_do_request(d, bio, orig_bdev, start_time);
return BLK_QC_T_NONE;
}
static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode,
@ -1273,7 +1271,7 @@ static void flash_dev_nodata(struct closure *cl)
continue_at(cl, search_free, NULL);
}
blk_qc_t flash_dev_submit_bio(struct bio *bio)
void flash_dev_submit_bio(struct bio *bio)
{
struct search *s;
struct closure *cl;
@ -1282,7 +1280,7 @@ blk_qc_t flash_dev_submit_bio(struct bio *bio)
if (unlikely(d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags))) {
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
return BLK_QC_T_NONE;
return;
}
s = search_alloc(bio, d, bio->bi_bdev, bio_start_io_acct(bio));
@ -1298,7 +1296,7 @@ blk_qc_t flash_dev_submit_bio(struct bio *bio)
continue_at_nobarrier(&s->cl,
flash_dev_nodata,
bcache_wq);
return BLK_QC_T_NONE;
return;
} else if (bio_data_dir(bio)) {
bch_keybuf_check_overlapping(&s->iop.c->moving_gc_keys,
&KEY(d->id, bio->bi_iter.bi_sector, 0),
@ -1314,7 +1312,6 @@ blk_qc_t flash_dev_submit_bio(struct bio *bio)
}
continue_at(cl, search_free, NULL);
return BLK_QC_T_NONE;
}
static int flash_dev_ioctl(struct bcache_device *d, fmode_t mode,

View File

@ -37,10 +37,10 @@ unsigned int bch_get_congested(const struct cache_set *c);
void bch_data_insert(struct closure *cl);
void bch_cached_dev_request_init(struct cached_dev *dc);
blk_qc_t cached_dev_submit_bio(struct bio *bio);
void cached_dev_submit_bio(struct bio *bio);
void bch_flash_dev_request_init(struct bcache_device *d);
blk_qc_t flash_dev_submit_bio(struct bio *bio);
void flash_dev_submit_bio(struct bio *bio);
extern struct kmem_cache *bch_search_cache;

View File

@ -8,6 +8,7 @@
#define DM_BIO_RECORD_H
#include <linux/bio.h>
#include <linux/blk-integrity.h>
/*
* There are lots of mutable fields in the bio struct that get

View File

@ -13,7 +13,7 @@
#include <linux/ktime.h>
#include <linux/genhd.h>
#include <linux/blk-mq.h>
#include <linux/keyslot-manager.h>
#include <linux/blk-crypto-profile.h>
#include <trace/events/block.h>
@ -200,7 +200,7 @@ struct dm_table {
struct dm_md_mempools *mempools;
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
struct blk_keyslot_manager *ksm;
struct blk_crypto_profile *crypto_profile;
#endif
};

View File

@ -15,6 +15,7 @@
#include <linux/key.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/mempool.h>
#include <linux/slab.h>
#include <linux/crypto.h>

View File

@ -12,6 +12,7 @@
#include "dm-ima.h"
#include <linux/ima.h>
#include <linux/sched/mm.h>
#include <crypto/hash.h>
#include <linux/crypto.h>
#include <crypto/hash_info.h>

View File

@ -27,6 +27,7 @@
#include <linux/blkdev.h>
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/sched/clock.h>
#define DM_MSG_PREFIX "multipath historical-service-time"

View File

@ -7,7 +7,6 @@
#include "dm-core.h"
#include "dm-rq.h"
#include <linux/elevator.h> /* for rq_end_sector() */
#include <linux/blk-mq.h>
#define DM_MSG_PREFIX "core-rq"

View File

@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/namei.h>
#include <linux/ctype.h>
#include <linux/string.h>
@ -169,7 +170,7 @@ static void free_devices(struct list_head *devices, struct mapped_device *md)
}
}
static void dm_table_destroy_keyslot_manager(struct dm_table *t);
static void dm_table_destroy_crypto_profile(struct dm_table *t);
void dm_table_destroy(struct dm_table *t)
{
@ -199,7 +200,7 @@ void dm_table_destroy(struct dm_table *t)
dm_free_md_mempools(t->mempools);
dm_table_destroy_keyslot_manager(t);
dm_table_destroy_crypto_profile(t);
kfree(t);
}
@ -1186,8 +1187,8 @@ static int dm_table_register_integrity(struct dm_table *t)
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
struct dm_keyslot_manager {
struct blk_keyslot_manager ksm;
struct dm_crypto_profile {
struct blk_crypto_profile profile;
struct mapped_device *md;
};
@ -1213,13 +1214,11 @@ static int dm_keyslot_evict_callback(struct dm_target *ti, struct dm_dev *dev,
* When an inline encryption key is evicted from a device-mapper device, evict
* it from all the underlying devices.
*/
static int dm_keyslot_evict(struct blk_keyslot_manager *ksm,
static int dm_keyslot_evict(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key, unsigned int slot)
{
struct dm_keyslot_manager *dksm = container_of(ksm,
struct dm_keyslot_manager,
ksm);
struct mapped_device *md = dksm->md;
struct mapped_device *md =
container_of(profile, struct dm_crypto_profile, profile)->md;
struct dm_keyslot_evict_args args = { key };
struct dm_table *t;
int srcu_idx;
@ -1239,150 +1238,148 @@ static int dm_keyslot_evict(struct blk_keyslot_manager *ksm,
return args.err;
}
static const struct blk_ksm_ll_ops dm_ksm_ll_ops = {
.keyslot_evict = dm_keyslot_evict,
};
static int device_intersect_crypto_modes(struct dm_target *ti,
struct dm_dev *dev, sector_t start,
sector_t len, void *data)
static int
device_intersect_crypto_capabilities(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
struct blk_keyslot_manager *parent = data;
struct blk_keyslot_manager *child = bdev_get_queue(dev->bdev)->ksm;
struct blk_crypto_profile *parent = data;
struct blk_crypto_profile *child =
bdev_get_queue(dev->bdev)->crypto_profile;
blk_ksm_intersect_modes(parent, child);
blk_crypto_intersect_capabilities(parent, child);
return 0;
}
void dm_destroy_keyslot_manager(struct blk_keyslot_manager *ksm)
void dm_destroy_crypto_profile(struct blk_crypto_profile *profile)
{
struct dm_keyslot_manager *dksm = container_of(ksm,
struct dm_keyslot_manager,
ksm);
struct dm_crypto_profile *dmcp = container_of(profile,
struct dm_crypto_profile,
profile);
if (!ksm)
if (!profile)
return;
blk_ksm_destroy(ksm);
kfree(dksm);
blk_crypto_profile_destroy(profile);
kfree(dmcp);
}
static void dm_table_destroy_keyslot_manager(struct dm_table *t)
static void dm_table_destroy_crypto_profile(struct dm_table *t)
{
dm_destroy_keyslot_manager(t->ksm);
t->ksm = NULL;
dm_destroy_crypto_profile(t->crypto_profile);
t->crypto_profile = NULL;
}
/*
* Constructs and initializes t->ksm with a keyslot manager that
* represents the common set of crypto capabilities of the devices
* described by the dm_table. However, if the constructed keyslot
* manager does not support a superset of the crypto capabilities
* supported by the current keyslot manager of the mapped_device,
* it returns an error instead, since we don't support restricting
* crypto capabilities on table changes. Finally, if the constructed
* keyslot manager doesn't actually support any crypto modes at all,
* it just returns NULL.
* Constructs and initializes t->crypto_profile with a crypto profile that
* represents the common set of crypto capabilities of the devices described by
* the dm_table. However, if the constructed crypto profile doesn't support all
* crypto capabilities that are supported by the current mapped_device, it
* returns an error instead, since we don't support removing crypto capabilities
* on table changes. Finally, if the constructed crypto profile is "empty" (has
* no crypto capabilities at all), it just sets t->crypto_profile to NULL.
*/
static int dm_table_construct_keyslot_manager(struct dm_table *t)
static int dm_table_construct_crypto_profile(struct dm_table *t)
{
struct dm_keyslot_manager *dksm;
struct blk_keyslot_manager *ksm;
struct dm_crypto_profile *dmcp;
struct blk_crypto_profile *profile;
struct dm_target *ti;
unsigned int i;
bool ksm_is_empty = true;
bool empty_profile = true;
dksm = kmalloc(sizeof(*dksm), GFP_KERNEL);
if (!dksm)
dmcp = kmalloc(sizeof(*dmcp), GFP_KERNEL);
if (!dmcp)
return -ENOMEM;
dksm->md = t->md;
dmcp->md = t->md;
ksm = &dksm->ksm;
blk_ksm_init_passthrough(ksm);
ksm->ksm_ll_ops = dm_ksm_ll_ops;
ksm->max_dun_bytes_supported = UINT_MAX;
memset(ksm->crypto_modes_supported, 0xFF,
sizeof(ksm->crypto_modes_supported));
profile = &dmcp->profile;
blk_crypto_profile_init(profile, 0);
profile->ll_ops.keyslot_evict = dm_keyslot_evict;
profile->max_dun_bytes_supported = UINT_MAX;
memset(profile->modes_supported, 0xFF,
sizeof(profile->modes_supported));
for (i = 0; i < dm_table_get_num_targets(t); i++) {
ti = dm_table_get_target(t, i);
if (!dm_target_passes_crypto(ti->type)) {
blk_ksm_intersect_modes(ksm, NULL);
blk_crypto_intersect_capabilities(profile, NULL);
break;
}
if (!ti->type->iterate_devices)
continue;
ti->type->iterate_devices(ti, device_intersect_crypto_modes,
ksm);
ti->type->iterate_devices(ti,
device_intersect_crypto_capabilities,
profile);
}
if (t->md->queue && !blk_ksm_is_superset(ksm, t->md->queue->ksm)) {
if (t->md->queue &&
!blk_crypto_has_capabilities(profile,
t->md->queue->crypto_profile)) {
DMWARN("Inline encryption capabilities of new DM table were more restrictive than the old table's. This is not supported!");
dm_destroy_keyslot_manager(ksm);
dm_destroy_crypto_profile(profile);
return -EINVAL;
}
/*
* If the new KSM doesn't actually support any crypto modes, we may as
* well represent it with a NULL ksm.
* If the new profile doesn't actually support any crypto capabilities,
* we may as well represent it with a NULL profile.
*/
ksm_is_empty = true;
for (i = 0; i < ARRAY_SIZE(ksm->crypto_modes_supported); i++) {
if (ksm->crypto_modes_supported[i]) {
ksm_is_empty = false;
for (i = 0; i < ARRAY_SIZE(profile->modes_supported); i++) {
if (profile->modes_supported[i]) {
empty_profile = false;
break;
}
}
if (ksm_is_empty) {
dm_destroy_keyslot_manager(ksm);
ksm = NULL;
if (empty_profile) {
dm_destroy_crypto_profile(profile);
profile = NULL;
}
/*
* t->ksm is only set temporarily while the table is being set
* up, and it gets set to NULL after the capabilities have
* been transferred to the request_queue.
* t->crypto_profile is only set temporarily while the table is being
* set up, and it gets set to NULL after the profile has been
* transferred to the request_queue.
*/
t->ksm = ksm;
t->crypto_profile = profile;
return 0;
}
static void dm_update_keyslot_manager(struct request_queue *q,
struct dm_table *t)
static void dm_update_crypto_profile(struct request_queue *q,
struct dm_table *t)
{
if (!t->ksm)
if (!t->crypto_profile)
return;
/* Make the ksm less restrictive */
if (!q->ksm) {
blk_ksm_register(t->ksm, q);
/* Make the crypto profile less restrictive. */
if (!q->crypto_profile) {
blk_crypto_register(t->crypto_profile, q);
} else {
blk_ksm_update_capabilities(q->ksm, t->ksm);
dm_destroy_keyslot_manager(t->ksm);
blk_crypto_update_capabilities(q->crypto_profile,
t->crypto_profile);
dm_destroy_crypto_profile(t->crypto_profile);
}
t->ksm = NULL;
t->crypto_profile = NULL;
}
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
static int dm_table_construct_keyslot_manager(struct dm_table *t)
static int dm_table_construct_crypto_profile(struct dm_table *t)
{
return 0;
}
void dm_destroy_keyslot_manager(struct blk_keyslot_manager *ksm)
void dm_destroy_crypto_profile(struct blk_crypto_profile *profile)
{
}
static void dm_table_destroy_keyslot_manager(struct dm_table *t)
static void dm_table_destroy_crypto_profile(struct dm_table *t)
{
}
static void dm_update_keyslot_manager(struct request_queue *q,
struct dm_table *t)
static void dm_update_crypto_profile(struct request_queue *q,
struct dm_table *t)
{
}
@ -1414,9 +1411,9 @@ int dm_table_complete(struct dm_table *t)
return r;
}
r = dm_table_construct_keyslot_manager(t);
r = dm_table_construct_crypto_profile(t);
if (r) {
DMERR("could not construct keyslot manager.");
DMERR("could not construct crypto profile.");
return r;
}
@ -2070,7 +2067,7 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
return r;
}
dm_update_keyslot_manager(q, t);
dm_update_crypto_profile(q, t);
disk_update_readahead(t->md->disk);
return 0;

View File

@ -18,6 +18,7 @@
#include "dm-verity-verify-sig.h"
#include <linux/module.h>
#include <linux/reboot.h>
#include <linux/scatterlist.h>
#define DM_MSG_PREFIX "verity"

View File

@ -29,7 +29,7 @@
#include <linux/refcount.h>
#include <linux/part_stat.h>
#include <linux/blk-crypto.h>
#include <linux/keyslot-manager.h>
#include <linux/blk-crypto-profile.h>
#define DM_MSG_PREFIX "core"
@ -1183,14 +1183,13 @@ static noinline void __set_swap_bios_limit(struct mapped_device *md, int latch)
mutex_unlock(&md->swap_bios_lock);
}
static blk_qc_t __map_bio(struct dm_target_io *tio)
static void __map_bio(struct dm_target_io *tio)
{
int r;
sector_t sector;
struct bio *clone = &tio->clone;
struct dm_io *io = tio->io;
struct dm_target *ti = tio->ti;
blk_qc_t ret = BLK_QC_T_NONE;
clone->bi_end_io = clone_endio;
@ -1226,7 +1225,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
case DM_MAPIO_REMAPPED:
/* the bio has been remapped so dispatch it */
trace_block_bio_remap(clone, bio_dev(io->orig_bio), sector);
ret = submit_bio_noacct(clone);
submit_bio_noacct(clone);
break;
case DM_MAPIO_KILL:
if (unlikely(swap_bios_limit(ti, clone))) {
@ -1248,8 +1247,6 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
DMWARN("unimplemented target map return value: %d", r);
BUG();
}
return ret;
}
static void bio_setup_sector(struct bio *bio, sector_t sector, unsigned len)
@ -1336,7 +1333,7 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
}
}
static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci,
static void __clone_and_map_simple_bio(struct clone_info *ci,
struct dm_target_io *tio, unsigned *len)
{
struct bio *clone = &tio->clone;
@ -1346,8 +1343,7 @@ static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci,
__bio_clone_fast(clone, ci->bio);
if (len)
bio_setup_sector(clone, ci->sector, *len);
return __map_bio(tio);
__map_bio(tio);
}
static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
@ -1361,7 +1357,7 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
while ((bio = bio_list_pop(&blist))) {
tio = container_of(bio, struct dm_target_io, clone);
(void) __clone_and_map_simple_bio(ci, tio, len);
__clone_and_map_simple_bio(ci, tio, len);
}
}
@ -1405,7 +1401,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
free_tio(tio);
return r;
}
(void) __map_bio(tio);
__map_bio(tio);
return 0;
}
@ -1520,11 +1516,10 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
/*
* Entry point to split a bio into clones and submit them to the targets.
*/
static blk_qc_t __split_and_process_bio(struct mapped_device *md,
static void __split_and_process_bio(struct mapped_device *md,
struct dm_table *map, struct bio *bio)
{
struct clone_info ci;
blk_qc_t ret = BLK_QC_T_NONE;
int error = 0;
init_clone_info(&ci, md, map, bio);
@ -1567,19 +1562,17 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
bio_chain(b, bio);
trace_block_split(b, bio->bi_iter.bi_sector);
ret = submit_bio_noacct(bio);
submit_bio_noacct(bio);
}
}
/* drop the extra reference count */
dm_io_dec_pending(ci.io, errno_to_blk_status(error));
return ret;
}
static blk_qc_t dm_submit_bio(struct bio *bio)
static void dm_submit_bio(struct bio *bio)
{
struct mapped_device *md = bio->bi_bdev->bd_disk->private_data;
blk_qc_t ret = BLK_QC_T_NONE;
int srcu_idx;
struct dm_table *map;
@ -1609,10 +1602,9 @@ static blk_qc_t dm_submit_bio(struct bio *bio)
if (is_abnormal_io(bio))
blk_queue_split(&bio);
ret = __split_and_process_bio(md, map, bio);
__split_and_process_bio(md, map, bio);
out:
dm_put_live_table(md, srcu_idx);
return ret;
}
/*-----------------------------------------------------------------
@ -1671,14 +1663,14 @@ static const struct dax_operations dm_dax_ops;
static void dm_wq_work(struct work_struct *work);
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
static void dm_queue_destroy_keyslot_manager(struct request_queue *q)
static void dm_queue_destroy_crypto_profile(struct request_queue *q)
{
dm_destroy_keyslot_manager(q->ksm);
dm_destroy_crypto_profile(q->crypto_profile);
}
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
static inline void dm_queue_destroy_keyslot_manager(struct request_queue *q)
static inline void dm_queue_destroy_crypto_profile(struct request_queue *q)
{
}
#endif /* !CONFIG_BLK_INLINE_ENCRYPTION */
@ -1704,7 +1696,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
dm_sysfs_exit(md);
del_gendisk(md->disk);
}
dm_queue_destroy_keyslot_manager(md->queue);
dm_queue_destroy_crypto_profile(md->queue);
blk_cleanup_disk(md->disk);
}

View File

@ -41,6 +41,7 @@
#include <linux/sched/signal.h>
#include <linux/kthread.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/badblocks.h>
#include <linux/sysctl.h>
#include <linux/seq_file.h>
@ -51,6 +52,7 @@
#include <linux/hdreg.h>
#include <linux/proc_fs.h>
#include <linux/random.h>
#include <linux/major.h>
#include <linux/module.h>
#include <linux/reboot.h>
#include <linux/file.h>
@ -441,19 +443,19 @@ check_suspended:
}
EXPORT_SYMBOL(md_handle_request);
static blk_qc_t md_submit_bio(struct bio *bio)
static void md_submit_bio(struct bio *bio)
{
const int rw = bio_data_dir(bio);
struct mddev *mddev = bio->bi_bdev->bd_disk->private_data;
if (mddev == NULL || mddev->pers == NULL) {
bio_io_error(bio);
return BLK_QC_T_NONE;
return;
}
if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
bio_io_error(bio);
return BLK_QC_T_NONE;
return;
}
blk_queue_split(&bio);
@ -462,15 +464,13 @@ static blk_qc_t md_submit_bio(struct bio *bio)
if (bio_sectors(bio) != 0)
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
return BLK_QC_T_NONE;
return;
}
/* bio could be mergeable after passing to underlayer */
bio->bi_opf &= ~REQ_NOMERGE;
md_handle_request(mddev, bio);
return BLK_QC_T_NONE;
}
/* mddev_suspend makes sure no new requests are submitted

View File

@ -16,13 +16,13 @@ void mmc_crypto_set_initial_state(struct mmc_host *host)
{
/* Reset might clear all keys, so reprogram all the keys. */
if (host->caps2 & MMC_CAP2_CRYPTO)
blk_ksm_reprogram_all_keys(&host->ksm);
blk_crypto_reprogram_all_keys(&host->crypto_profile);
}
void mmc_crypto_setup_queue(struct request_queue *q, struct mmc_host *host)
{
if (host->caps2 & MMC_CAP2_CRYPTO)
blk_ksm_register(&host->ksm, q);
blk_crypto_register(&host->crypto_profile, q);
}
EXPORT_SYMBOL_GPL(mmc_crypto_setup_queue);
@ -30,12 +30,15 @@ void mmc_crypto_prepare_req(struct mmc_queue_req *mqrq)
{
struct request *req = mmc_queue_req_to_req(mqrq);
struct mmc_request *mrq = &mqrq->brq.mrq;
struct blk_crypto_keyslot *keyslot;
if (!req->crypt_ctx)
return;
mrq->crypto_ctx = req->crypt_ctx;
if (req->crypt_keyslot)
mrq->crypto_key_slot = blk_ksm_get_slot_idx(req->crypt_keyslot);
keyslot = req->crypt_keyslot;
if (keyslot)
mrq->crypto_key_slot = blk_crypto_keyslot_index(keyslot);
}
EXPORT_SYMBOL_GPL(mmc_crypto_prepare_req);

View File

@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/stat.h>
#include <linux/pm_runtime.h>
#include <linux/scatterlist.h>
#include <linux/mmc/host.h>
#include <linux/mmc/card.h>

View File

@ -6,7 +6,7 @@
*/
#include <linux/blk-crypto.h>
#include <linux/keyslot-manager.h>
#include <linux/blk-crypto-profile.h>
#include <linux/mmc/host.h>
#include "cqhci-crypto.h"
@ -23,9 +23,10 @@ static const struct cqhci_crypto_alg_entry {
};
static inline struct cqhci_host *
cqhci_host_from_ksm(struct blk_keyslot_manager *ksm)
cqhci_host_from_crypto_profile(struct blk_crypto_profile *profile)
{
struct mmc_host *mmc = container_of(ksm, struct mmc_host, ksm);
struct mmc_host *mmc =
container_of(profile, struct mmc_host, crypto_profile);
return mmc->cqe_private;
}
@ -57,12 +58,12 @@ static int cqhci_crypto_program_key(struct cqhci_host *cq_host,
return 0;
}
static int cqhci_crypto_keyslot_program(struct blk_keyslot_manager *ksm,
static int cqhci_crypto_keyslot_program(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key,
unsigned int slot)
{
struct cqhci_host *cq_host = cqhci_host_from_ksm(ksm);
struct cqhci_host *cq_host = cqhci_host_from_crypto_profile(profile);
const union cqhci_crypto_cap_entry *ccap_array =
cq_host->crypto_cap_array;
const struct cqhci_crypto_alg_entry *alg =
@ -115,11 +116,11 @@ static int cqhci_crypto_clear_keyslot(struct cqhci_host *cq_host, int slot)
return cqhci_crypto_program_key(cq_host, &cfg, slot);
}
static int cqhci_crypto_keyslot_evict(struct blk_keyslot_manager *ksm,
static int cqhci_crypto_keyslot_evict(struct blk_crypto_profile *profile,
const struct blk_crypto_key *key,
unsigned int slot)
{
struct cqhci_host *cq_host = cqhci_host_from_ksm(ksm);
struct cqhci_host *cq_host = cqhci_host_from_crypto_profile(profile);
return cqhci_crypto_clear_keyslot(cq_host, slot);
}
@ -132,7 +133,7 @@ static int cqhci_crypto_keyslot_evict(struct blk_keyslot_manager *ksm,
* "enabled" when these are called, i.e. CQHCI_ENABLE might not be set in the
* CQHCI_CFG register. But the hardware allows that.
*/
static const struct blk_ksm_ll_ops cqhci_ksm_ops = {
static const struct blk_crypto_ll_ops cqhci_crypto_ops = {
.keyslot_program = cqhci_crypto_keyslot_program,
.keyslot_evict = cqhci_crypto_keyslot_evict,
};
@ -157,8 +158,8 @@ cqhci_find_blk_crypto_mode(union cqhci_crypto_cap_entry cap)
*
* If the driver previously set MMC_CAP2_CRYPTO and the CQE declares
* CQHCI_CAP_CS, initialize the crypto support. This involves reading the
* crypto capability registers, initializing the keyslot manager, clearing all
* keyslots, and enabling 128-bit task descriptors.
* crypto capability registers, initializing the blk_crypto_profile, clearing
* all keyslots, and enabling 128-bit task descriptors.
*
* Return: 0 if crypto was initialized or isn't supported; whether
* MMC_CAP2_CRYPTO remains set indicates which one of those cases it is.
@ -168,7 +169,7 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
{
struct mmc_host *mmc = cq_host->mmc;
struct device *dev = mmc_dev(mmc);
struct blk_keyslot_manager *ksm = &mmc->ksm;
struct blk_crypto_profile *profile = &mmc->crypto_profile;
unsigned int num_keyslots;
unsigned int cap_idx;
enum blk_crypto_mode_num blk_mode_num;
@ -199,15 +200,15 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
*/
num_keyslots = cq_host->crypto_capabilities.config_count + 1;
err = devm_blk_ksm_init(dev, ksm, num_keyslots);
err = devm_blk_crypto_profile_init(dev, profile, num_keyslots);
if (err)
goto out;
ksm->ksm_ll_ops = cqhci_ksm_ops;
ksm->dev = dev;
profile->ll_ops = cqhci_crypto_ops;
profile->dev = dev;
/* Unfortunately, CQHCI crypto only supports 32 DUN bits. */
ksm->max_dun_bytes_supported = 4;
profile->max_dun_bytes_supported = 4;
/*
* Cache all the crypto capabilities and advertise the supported crypto
@ -223,7 +224,7 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
cq_host->crypto_cap_array[cap_idx]);
if (blk_mode_num == BLK_ENCRYPTION_MODE_INVALID)
continue;
ksm->crypto_modes_supported[blk_mode_num] |=
profile->modes_supported[blk_mode_num] |=
cq_host->crypto_cap_array[cap_idx].sdus_mask * 512;
}

View File

@ -15,6 +15,7 @@
#include <linux/slab.h>
#include <linux/major.h>
#include <linux/backing-dev.h>
#include <linux/blkdev.h>
#include <linux/fs_context.h>
#include "mtdcore.h"

View File

@ -162,7 +162,7 @@ static int nsblk_do_bvec(struct nd_namespace_blk *nsblk,
return err;
}
static blk_qc_t nd_blk_submit_bio(struct bio *bio)
static void nd_blk_submit_bio(struct bio *bio)
{
struct bio_integrity_payload *bip;
struct nd_namespace_blk *nsblk = bio->bi_bdev->bd_disk->private_data;
@ -173,7 +173,7 @@ static blk_qc_t nd_blk_submit_bio(struct bio *bio)
bool do_acct;
if (!bio_integrity_prep(bio))
return BLK_QC_T_NONE;
return;
bip = bio_integrity(bio);
rw = bio_data_dir(bio);
@ -199,7 +199,6 @@ static blk_qc_t nd_blk_submit_bio(struct bio *bio)
bio_end_io_acct(bio, start);
bio_endio(bio);
return BLK_QC_T_NONE;
}
static int nsblk_rw_bytes(struct nd_namespace_common *ndns,

View File

@ -1440,7 +1440,7 @@ static int btt_do_bvec(struct btt *btt, struct bio_integrity_payload *bip,
return ret;
}
static blk_qc_t btt_submit_bio(struct bio *bio)
static void btt_submit_bio(struct bio *bio)
{
struct bio_integrity_payload *bip = bio_integrity(bio);
struct btt *btt = bio->bi_bdev->bd_disk->private_data;
@ -1451,7 +1451,7 @@ static blk_qc_t btt_submit_bio(struct bio *bio)
bool do_acct;
if (!bio_integrity_prep(bio))
return BLK_QC_T_NONE;
return;
do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
if (do_acct)
@ -1483,7 +1483,6 @@ static blk_qc_t btt_submit_bio(struct bio *bio)
bio_end_io_acct(bio, start);
bio_endio(bio);
return BLK_QC_T_NONE;
}
static int btt_rw_page(struct block_device *bdev, sector_t sector,

View File

@ -7,6 +7,7 @@
#include <linux/export.h>
#include <linux/module.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
#include <linux/device.h>
#include <linux/ctype.h>
#include <linux/ndctl.h>

View File

@ -190,7 +190,7 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
return rc;
}
static blk_qc_t pmem_submit_bio(struct bio *bio)
static void pmem_submit_bio(struct bio *bio)
{
int ret = 0;
blk_status_t rc = 0;
@ -229,7 +229,6 @@ static blk_qc_t pmem_submit_bio(struct bio *bio)
bio->bi_status = errno_to_blk_status(ret);
bio_endio(bio);
return BLK_QC_T_NONE;
}
static int pmem_rw_page(struct block_device *bdev, sector_t sector,

View File

@ -6,6 +6,7 @@
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/blk-integrity.h>
#include <linux/compat.h>
#include <linux/delay.h>
#include <linux/errno.h>
@ -118,25 +119,6 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
struct nvme_command *cmd);
/*
* Prepare a queue for teardown.
*
* This must forcibly unquiesce queues to avoid blocking dispatch, and only set
* the capacity to 0 after that to avoid blocking dispatchers that may be
* holding bd_butex. This will end buffered writers dirtying pages that can't
* be synced.
*/
static void nvme_set_queue_dying(struct nvme_ns *ns)
{
if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
return;
blk_set_queue_dying(ns->queue);
blk_mq_unquiesce_queue(ns->queue);
set_capacity_and_notify(ns->disk, 0);
}
void nvme_queue_scan(struct nvme_ctrl *ctrl)
{
/*
@ -345,15 +327,19 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
return RETRY;
}
static inline void nvme_end_req(struct request *req)
static inline void nvme_end_req_zoned(struct request *req)
{
blk_status_t status = nvme_error_status(nvme_req(req)->status);
if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
req_op(req) == REQ_OP_ZONE_APPEND)
req->__sector = nvme_lba_to_sect(req->q->queuedata,
le64_to_cpu(nvme_req(req)->result.u64));
}
static inline void nvme_end_req(struct request *req)
{
blk_status_t status = nvme_error_status(nvme_req(req)->status);
nvme_end_req_zoned(req);
nvme_trace_bio_complete(req);
blk_mq_end_request(req, status);
}
@ -380,6 +366,13 @@ void nvme_complete_rq(struct request *req)
}
EXPORT_SYMBOL_GPL(nvme_complete_rq);
void nvme_complete_batch_req(struct request *req)
{
nvme_cleanup_cmd(req);
nvme_end_req_zoned(req);
}
EXPORT_SYMBOL_GPL(nvme_complete_batch_req);
/*
* Called to unwind from ->queue_rq on a failed command submission so that the
* multipathing code gets called to potentially failover to another path.
@ -631,7 +624,7 @@ static inline void nvme_init_request(struct request *req,
req->cmd_flags |= REQ_FAILFAST_DRIVER;
if (req->mq_hctx->type == HCTX_TYPE_POLL)
req->cmd_flags |= REQ_HIPRI;
req->cmd_flags |= REQ_POLLED;
nvme_clear_nvme_request(req);
memcpy(nvme_req(req)->cmd, cmd, sizeof(*cmd));
}
@ -4473,6 +4466,37 @@ out:
}
EXPORT_SYMBOL_GPL(nvme_init_ctrl);
static void nvme_start_ns_queue(struct nvme_ns *ns)
{
if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
blk_mq_unquiesce_queue(ns->queue);
}
static void nvme_stop_ns_queue(struct nvme_ns *ns)
{
if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
blk_mq_quiesce_queue(ns->queue);
}
/*
* Prepare a queue for teardown.
*
* This must forcibly unquiesce queues to avoid blocking dispatch, and only set
* the capacity to 0 after that to avoid blocking dispatchers that may be
* holding bd_butex. This will end buffered writers dirtying pages that can't
* be synced.
*/
static void nvme_set_queue_dying(struct nvme_ns *ns)
{
if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
return;
blk_set_queue_dying(ns->queue);
nvme_start_ns_queue(ns);
set_capacity_and_notify(ns->disk, 0);
}
/**
* nvme_kill_queues(): Ends all namespace queues
* @ctrl: the dead controller that needs to end
@ -4488,7 +4512,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
/* Forcibly unquiesce queues to avoid blocking dispatch */
if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
blk_mq_unquiesce_queue(ctrl->admin_q);
nvme_start_admin_queue(ctrl);
list_for_each_entry(ns, &ctrl->namespaces, list)
nvme_set_queue_dying(ns);
@ -4551,7 +4575,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
down_read(&ctrl->namespaces_rwsem);
list_for_each_entry(ns, &ctrl->namespaces, list)
blk_mq_quiesce_queue(ns->queue);
nvme_stop_ns_queue(ns);
up_read(&ctrl->namespaces_rwsem);
}
EXPORT_SYMBOL_GPL(nvme_stop_queues);
@ -4562,11 +4586,25 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
down_read(&ctrl->namespaces_rwsem);
list_for_each_entry(ns, &ctrl->namespaces, list)
blk_mq_unquiesce_queue(ns->queue);
nvme_start_ns_queue(ns);
up_read(&ctrl->namespaces_rwsem);
}
EXPORT_SYMBOL_GPL(nvme_start_queues);
void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
{
if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
blk_mq_quiesce_queue(ctrl->admin_q);
}
EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
{
if (test_and_clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
blk_mq_unquiesce_queue(ctrl->admin_q);
}
EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
{
struct nvme_ns *ns;

View File

@ -2382,7 +2382,7 @@ nvme_fc_ctrl_free(struct kref *ref)
list_del(&ctrl->ctrl_list);
spin_unlock_irqrestore(&ctrl->rport->lock, flags);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_start_admin_queue(&ctrl->ctrl);
blk_cleanup_queue(ctrl->ctrl.admin_q);
blk_cleanup_queue(ctrl->ctrl.fabrics_q);
blk_mq_free_tag_set(&ctrl->admin_tag_set);
@ -2510,7 +2510,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
/*
* clean up the admin queue. Same thing as above.
*/
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
nvme_stop_admin_queue(&ctrl->ctrl);
blk_sync_queue(ctrl->ctrl.admin_q);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
@ -3095,7 +3095,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
(ilog2(SZ_4K) - 9);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_start_admin_queue(&ctrl->ctrl);
ret = nvme_init_ctrl_finish(&ctrl->ctrl);
if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
@ -3249,7 +3249,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
nvme_fc_free_queue(&ctrl->queues[0]);
/* re-enable the admin_q so anything new can fast fail */
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_start_admin_queue(&ctrl->ctrl);
/* resume the io queues so that things will fast fail */
nvme_start_queues(&ctrl->ctrl);

Some files were not shown because too many files have changed in this diff Show More