Commit 0259d4498b ("bcache: move calc_cached_dev_sectors to proper
place on backing device detach") tries to fix calc_cached_dev_sectors
when bcache device detaches, but now we have:
cached_dev_detach_finish
...
bcache_device_detach(&dc->disk);
...
closure_put(&d->c->caching);
d->c = NULL; [*explicitly set dc->disk.c to NULL*]
list_move(&dc->list, &uncached_devices);
calc_cached_dev_sectors(dc->disk.c); [*passing a NULL pointer*]
...
Upper codeflows shows how bug happens, this patch fix the problem by
caching dc->disk.c beforehand, and cache_set won't be freed under us
because c->caching closure at least holds a reference count and closure
callback __cache_set_unregister only being called by bch_cache_set_stop
which using closure_queue(&c->caching), that means c->caching closure
callback for destroying cache_set won't be trigger by previous
closure_put(&d->c->caching).
So at this stage(while cached_dev_detach_finish is calling) it's safe to
access cache_set dc->disk.c.
Fixes: 0259d4498b ("bcache: move calc_cached_dev_sectors to proper place on backing device detach")
Signed-off-by: Lin Feng <linf@wangsu.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211112053629.3437-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dexuan reports that he's seeing spikes of very heavy CPU utilization when
running 24 disks and using the 'none' scheduler. This happens off the
sched restart path, because SCSI requires the queue to be restarted async,
and hence we're hammering on mod_delayed_work_on() to ensure that the work
item gets run appropriately.
Avoid hammering on the timer and just use queue_work_on() if no delay
has been specified.
Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The donation calculation logic assumes that the donor has non-zero
after-donation hweight, so the lowest active hweight a donating cgroup can
have is 2 so that it can donate 1 while keeping the other 1 for itself.
Earlier, we only donated from cgroups with sizable surpluses so this
condition was always true. However, with the precise donation algorithm
implemented, f1de2439ec ("blk-iocost: revamp donation amount
determination") made the donation amount calculation exact enabling even low
hweight cgroups to donate.
This means that in rare occasions, a cgroup with active hweight of 1 can
enter donation calculation triggering the following warning and then a
divide-by-zero oops.
WARNING: CPU: 4 PID: 0 at block/blk-iocost.c:1928 transfer_surpluses.cold+0x0/0x53 [884/94867]
...
RIP: 0010:transfer_surpluses.cold+0x0/0x53
Code: 92 ff 48 c7 c7 28 d1 ab b5 65 48 8b 34 25 00 ae 01 00 48 81 c6 90 06 00 00 e8 8b 3f fe ff 48 c7 c0 ea ff ff ff e9 95 ff 92 ff <0f> 0b 48 c7 c7 30 da ab b5 e8 71 3f fe ff 4c 89 e8 4d 85 ed 74 0
4
...
Call Trace:
<IRQ>
ioc_timer_fn+0x1043/0x1390
call_timer_fn+0xa1/0x2c0
__run_timers.part.0+0x1ec/0x2e0
run_timer_softirq+0x35/0x70
...
iocg: invalid donation weights in /a/b: active=1 donating=1 after=0
Fix it by excluding cgroups w/ active hweight < 2 from donating. Excluding
these extreme low hweight donations shouldn't affect work conservation in
any meaningful way.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: f1de2439ec ("blk-iocost: revamp donation amount determination")
Cc: stable@vger.kernel.org # v5.10+
Link: https://lore.kernel.org/r/Ybfh86iSvpWKxhVM@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 776b54e97a.
Looks like a last minute edit snuck into this patch, and as a result,
it doesn't even compile. Revert the change for now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent
change_pid(PIDTYPE_PGID) that can move the task from one hlist
to another while iterating. Serialize ioprio_get to take
the tasklist_lock in this case, just like it's set counterpart.
Fixes: d69b78ba1d (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}())
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In driver/md/md.c, if the function autorun_array() is called,
the problem of double free may occur.
In function autorun_array(), when the function do_md_run() returns an
error, the function do_md_stop() will be called.
The function do_md_run() called function md_run(), but in function
md_run(), the pointer mddev->private may be freed.
The function do_md_stop() called the function __md_stop(), but in
function __md_stop(), the pointer mddev->private also will be freed
without judging null.
At this time, the pointer mddev->private will be double free, so it
needs to be judged null or not.
Signed-off-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: Song Liu <songliubraving@fb.com>
The superblock of version 1.0 doesn't get moved to the new position on a
device size change. This leads to a rdev without a superblock on a known
position, the raid can't be re-assembled.
The line was removed by mistake and is re-added by this patch.
Fixes: d9c0fa509e ("md: fix max sectors calculation for super 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Hochholdinger <markus@hochholdinger.net>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
- set ana_log_size to 0 after freeing ana_log_buf (Hou Tao)
- show subsys nqn for duplicate cntlids (Keith Busch)
- disable namespace access for unsupported metadata (Keith Busch)
- report write pointer for a full zone as zone start + zone len
(Niklas Cassel)
- fix use after free when disconnecting a reconnecting ctrl
(Ruozhu Li)
- fix a list corruption in nvmet-tcp (Sagi Grimberg)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmGy8DMLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYP/AxAAwEmPSfDC6KvW0pOFglP14z73WakBZUs8OF7SosCr
+TEoCGtT8eeNvWAUnh2Ja59sSc1cUKAVeJL/DLj1jEap4RjrGtX6uVma2Hv52Inb
1YtSSQFW+dea5qq6aUIWJk8X9PSYFT3VW7sjOb11lrp5M38E3xckmzi1QWI9iX4P
qgVw86wdaOrhFO/sX9H5wlQEm7+ps7HXOjLiZOPrFVoggVca8BPpEyG1YRZPAQvr
NCnSgv6ciUY4zDfCQUYkT4vXEfEm7Y8Y9eDjOPlCaRuloHHYI6yhHgMgzvJgn2p7
WkFyai2y8RkSAGtsAtt93bi7mPpM6Zx3t3xXV8yIht/+uyT1eZGCy7feFIjeFEnT
GXtwRVCkHkhdsIuPR/GV5NAmtGb0sqaLdiMmOw3OCxfWBeqv4KizLROhZmTqNLP5
V50PXj4aJzyGQrqTbsrcyqfvPDm16HHM9EQwjC/YsaUqjpRHxFHuKrQ3iYb3+BdE
zx6gRlx5eMIvBAbdxCH5409XWiVcyBNuCw1zjCdeKT0PNQwWAP+HRzH0HVbaPdse
EnfVKy4r6VtlmFBWqKSKagVQMrULoohaAOmzVrEsqfDBw4OU+LKB3sw2iGlh4kFj
YW2N+Ey0CnYdkgbqiJ2Z7ahbzNSLcoykI8Gij4omrhT3yIgNCjwtNiokZ6gKOpO0
coA=
=Q+lW
-----END PGP SIGNATURE-----
Merge tag 'nvme-5.16-2021-12-10' of git://git.infradead.org/nvme into block-5.16
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.16
- set ana_log_size to 0 after freeing ana_log_buf (Hou Tao)
- show subsys nqn for duplicate cntlids (Keith Busch)
- disable namespace access for unsupported metadata (Keith Busch)
- report write pointer for a full zone as zone start + zone len
(Niklas Cassel)
- fix use after free when disconnecting a reconnecting ctrl
(Ruozhu Li)
- fix a list corruption in nvmet-tcp (Sagi Grimberg)"
* tag 'nvme-5.16-2021-12-10' of git://git.infradead.org/nvme:
nvmet-tcp: fix possible list corruption for unexpected command failure
nvme: fix use after free when disconnecting a reconnecting ctrl
nvme-multipath: set ana_log_size to 0 after free ana_log_buf
nvme: report write pointer for a full zone as zone start + zone len
nvme: disable namespace access for unsupported metadata
nvme: show subsys nqn for duplicate cntlids
nvmet_tcp_handle_req_failure needs to understand weather to prepare
for incoming data or the next pdu. However if we misidentify this, we
will wait for 0-length data, and queue the response although nvmet_req_init
already did that.
The particular command was namespace management command with no data,
which was incorrectly categorized as a command with incapsule data.
Also, add a code comment of what we are trying to do here.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882
CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
[...]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
iocb_put fs/aio.c:1161 [inline]
io_submit_one+0x496/0x2fe0 fs/aio.c:1882
__do_sys_io_submit fs/aio.c:1938 [inline]
__se_sys_io_submit fs/aio.c:1908 [inline]
__x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
__blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages()
directly, in which case upper layers won't be expecting ->ki_complete
to be called by the block layer and will terminate the request. However,
there is also bio_endio() leading to a second ->ki_complete and a double
free.
Fixes: 54a88eb838 ("block: add single bio async direct IO helper")
Reported-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A crash happens when trying to disconnect a reconnecting ctrl:
1) The network was cut off when the connection was just established,
scan work hang there waiting for some IOs complete. Those I/Os were
retried because we return BLK_STS_RESOURCE to blk in reconnecting.
2) After a while, I tried to disconnect this connection. This
procedure also hangs because it tried to obtain ctrl->scan_lock.
It should be noted that now we have switched the controller state
to NVME_CTRL_DELETING.
3) In nvme_check_ready(), we always return true when ctrl->state is
NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
device which was already freed.
To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
device only when queue state is live. If not, return host path error to
the block layer
Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Set ana_log_size to 0 when ana_log_buf is freed to make sure
nvme_mpath_init_identify will do the right thing when retrying
after an earlier failure.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The write pointer in NVMe ZNS is invalid for a zone in zone state full.
The same also holds true for ZAC/ZBC.
The current behavior for NVMe is to simply propagate the wp reported by
the drive, even for full zones. Since the wp is invalid for a full zone,
the wp reported by the drive may be any value.
The way that the sd_zbc driver handles a full zone is to always report
the wp as zone start + zone len, regardless of what the drive reported.
null_blk also follows this convention.
Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write
pointer for a full zone in a consistent way, regardless of the interface
of the underlying zoned block device.
blkzone report before patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8
reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
blkzone report after patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0
non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The only fabrics target that supports metadata handling through the
separate integrity buffer is RDMA. It is currently usable only if the
size is 8B per block and formatted for protection information. If an
rdma target were to export a namespace with a different format (ex:
4k+64B), the driver will not be able to submit valid read/write commands
for that namespace.
Suppress setting the metadata feature in the namespace so that the
gendisk capacity will be set to 0. This will prevent read/write access
through the block stack, but will continue to allow ioctl passthrough
commands.
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The driver assigned nvme handle isn't persistent across reboots, so is
not enough information to match up where the collisions are occuring.
Add the subsys nqn string to the output so that it can more easily be
identified later.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215099
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
kernel test robot reported that RCU stall via printk() flooding is
possible [1] when stress testing.
Link: https://lkml.kernel.org/r/20211129073709.GA18483@xsang-OptiPlex-9020 [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If writeback isn't configured, then we get the following warning when
compiling zram:
drivers/block/zram/zram_drv.c:1824:45: warning: unused variable 'zram_wb_devops' [-Wunused-const-variable]
Make sure we only define the block_device_operations if that option is
enabled.
Link: https://lore.kernel.org/lkml/202111261614.gCJMqcyh-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We need to call rq_qos_done() regardless of whether or not we're freeing
the request or not, as the reference count doesn't cover the IO completion
tracking.
Fixes: f794f3351f ("block: add support for blk_mq_end_request_batch()")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The build warning:
block/blk-core.c:968: warning: Function parameter or member 'iob'
not described in 'bio_poll'.
Fixes: 5a72e899ce ("block: add a struct io_comp_batch argument to fops->iopoll()")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- add a NO APST quirk for a Kioxia device (Enzo Matsumiya)
- fix write zeroes pi (Klaus Jensen)
- various TCP transport fixes (Maurizio Lombardi and Varun Prakash)
- ignore invalid fast_io_fail_tmo values (Maurizio Lombardi)
- use IOCB_NOWAIT only if the filesystem supports it (Maurizio Lombardi)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmGfl+ALHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYP5JA//RbalL93O4UPIHtgIlwQ0TfGW8dhKEwrOJ8etKRnq
VOeFfwY3pWVqAKgKwBW9+tBaim3+Kc6m3BDyN6jzWzM6V9Bx0uKkZTTZY8juYZeH
JolUc6XdAnf15Gvxv//9nS+bRZPUGclZ2bR09P4zqGqTcOVq96+mzxh+tNniMKVg
jcVTOLn+iyxfmMNITK01D871BexqB9d1PdaPGlJ4MJFGUtSlgepuqcIw+PS7ahLR
tHOM3idXGpK/9UAK5VXKLbVbkpn69ndgiOuSYO5miMXg+1yD3mZjQ1UCyyTSVwSY
ItlY+QA9WLeAcWh6RCqAXDKJszLjyAxbUEzhB+OUDBUIb88Eai71b4RHlMru+5gy
F21+l3I2adnjNmNkjSQ8tcYDsq/twih3cKx+LuwLdpha8DCpp64kGu6BucV+I+jr
Z2E7amAd6qEYrrM8ya8N1co77Mx6VW6v7OCM7U/m4wNQzXFh2+UgFVV4YCefMzrQ
qV7/VbXUXJrNa3uGHn334fVFAnYiy2N3M3ZXWpy504BLOYafJDEZj6ltNwLVgbXH
P2oaFEa4+7RRgfJP1E4kp8cRf3qQq0gJjMx8Zz7krCwhJnyj9t93+qgaGZ0SDZx3
6ld3elwu+xtfx+R0JnKsEVcJz9199nasowiEurSv7igA0W83J4HCQlq4PUn3QRjm
/dI=
=Hjjj
-----END PGP SIGNATURE-----
Merge tag 'nvme-5.16-2021-11-25' of git://git.infradead.org/nvme into block-5.16
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.16
- add a NO APST quirk for a Kioxia device (Enzo Matsumiya)
- fix write zeroes pi (Klaus Jensen)
- various TCP transport fixes (Maurizio Lombardi and Varun Prakash)
- ignore invalid fast_io_fail_tmo values (Maurizio Lombardi)
- use IOCB_NOWAIT only if the filesystem supports it (Maurizio Lombardi)"
* tag 'nvme-5.16-2021-11-25' of git://git.infradead.org/nvme:
nvmet: use IOCB_NOWAIT only if the filesystem supports it
nvme: fix write zeroes pi
nvme-fabrics: ignore invalid fast_io_fail_tmo values
nvme-pci: add NO APST quirk for Kioxia device
nvme-tcp: fix memory leak when freeing a queue
nvme-tcp: validate R2T PDU in nvme_tcp_handle_r2t()
nvmet-tcp: fix incomplete data digest send
nvmet-tcp: fix memory leak when performing a controller reset
nvmet-tcp: add an helper to free the cmd buffers
nvmet-tcp: fix a race condition between release_queue and io_work
Submit I/O requests with the IOCB_NOWAIT flag set only if
the underlying filesystem supports it.
Fixes: 50a909db36 ("nvmet: use IOCB_NOWAIT for file-ns buffered I/O")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Write Zeroes sets PRACT when block integrity is enabled (as it should),
but neglects to also set the reftag which is expected by reads. This
causes protection errors on reads.
Fix this by setting the reftag for type 1 and 2 (for type 3, reads will
not check the reftag).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This particular Kioxia device times out and aborts I/O during any load,
but it's more easily observable with discards (fstrim).
The device gets to a state that is also not possible to use
"nvme set-feature" to disable APST.
Booting with nvme_core.default_ps_max_latency=0 solves the issue.
We had a dozen or so of these devices behaving this same way in
customer environments.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Release the page frag cache when tearing down the io queues
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If maxh2cdata < r2t_length then driver will form multiple
H2CData PDUs, validate R2T PDU in nvme_tcp_handle_r2t() to
reuse nvme_tcp_setup_h2c_data_pdu().
Also set req->state to NVME_TCP_SEND_H2C_PDU in
nvme_tcp_setup_h2c_data_pdu().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Current nvmet_try_send_ddgst() code does not check whether
all data digest bytes are transmitted, fix this by returning
-EAGAIN if all data digest bytes are not transmitted.
Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If a reset controller is executed while the initiator
is performing some I/O the driver may leak the memory allocated
for the commands' iovec.
Make sure that nvmet_tcp_uninit_data_in_cmds() releases
all the memory.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Makes the code easier to read and to debug.
Sets the freed pointers to NULL, it will be useful
when destroying the queues to understand if the commands'
buffers have been released already or not.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If the initiator executes a reset controller operation while
performing I/O, the target kernel will crash because of a race condition
between release_queue and io_work;
nvmet_tcp_uninit_data_in_cmds() may be executed while io_work
is running, calling flush_work() was not sufficient to
prevent this because io_work could requeue itself.
Fix this bug by using cancel_work_sync() to prevent io_work
from requeuing itself and set rcv_state to NVMET_TCP_RECV_ERR to
make sure we don't receive any more data from the socket.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
disk->fops->owner is grabbed in blkdev_get_no_open() after the disk
kobject refcount is increased. This way can't make sure that
disk->fops->owner is still alive since del_gendisk() still can move
on if the kobject refcount of disk is grabbed by open() and
disk->fops->open() isn't called yet.
Fixes the issue by moving try_module_get() into blkdev_get_by_dev()
with ->open_mutex() held, then we can drain the in-progress open()
in del_gendisk(). Meantime new open() won't succeed because disk
becomes not alive.
This way is reasonable because blkdev_get_no_open() needn't to touch
disk->fops or defined callbacks.
Cc: Christoph Hellwig <hch@lst.de>
Cc: czhong@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111020343.316126-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never insert flush request into scheduler queue before.
Recently commit d92ca9d834 ("blk-mq: don't handle non-flush requests in
blk_insert_flush") tries to handle FUA data request as normal request.
This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
case of kyber since RQF_ELVPRIV isn't set for flush request, then
->finish_request won't be called.
Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
when the device supports FUA, just like what we did before.
[1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: d92ca9d834 ("blk-mq: don't handle non-flush requests in blk_insert_flush")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20211118153041.2163228-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If blk_queue_enter() failed due to queue is dying, the
blkdev_put_no_open() is needed because blkcg_conf_open_bdev() succeeded.
Fixes: 0c9d338c84 ("blk-cgroup: synchronize blkg creation against policy deactivation")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20211102020705.2321858-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
elevator_init_mq() is only called before adding disk, when there isn't
any FS I/O, only passthrough requests can be queued, so freezing queue
plus canceling dispatch work is enough to drain any dispatch activities,
then we can avoid synchronize_srcu() in blk_mq_quiesce_queue().
Long boot latency issue can be fixed in case of lots of disks added
during booting.
Fixes: 737eb78e82 ("block: Delay default elevator initialization")
Reported-by: yangerkun <yangerkun@huawei.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211117115502.1600950-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit d07f3b081e.
pstore-blk was fixed to avoid the unwanted APIs in commit 7bb9557b48
("pstore/blk: Use the normal block device I/O path"), which landed in
the same release as the commit adding BROKEN.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211116181559.3975566-1-keescook@chromium.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we fail the submission queue checks, we don't put the queue afterwards.
This can cause various issues like stalls on scheduler switch or failure
to remove the device, or like in the original bug report, timeout waiting
for the device on reboot/restart.
While in there, fix a few whitespace discrepancies in the surrounding
code.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215039
Fixes: b637108a40 ("blk-mq: fix filesystem I/O request allocation")
Reported-and-tested-by: Stephen Smith <stephenmsmith@blueyonder.co.uk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.
The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
which is enabled by default.
Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
bugfix only appears in Clang 14.0.0, so older versions still contain
the bug and -Wimplicit-fallthrough won't be enabled for them, for now.
This concludes a long journey and now we are finally getting rid
of the unintentional fallthrough bug-class in the kernel, entirely. :)
Link: 9ed4a94d64 [1]
Link: https://bugs.llvm.org/show_bug.cgi?id=51094 [2]
Link: https://github.com/KSPP/linux/issues/115
Link: https://github.com/ClangBuiltLinux/linux/issues/236
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Clean up open-coded swap() calls.
* A little bit of #ifdef golf to complete the reunification of the
kernel and userspace libxfs source code.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGNT1IACgkQ+H93GTRK
tOucKA//Qk2NX3QBm/8pCrFE5V+eqooPANhZmzeviJCN/6++jcNOy0f+YK6JXVRC
U2WdotHFH5fF6lsDkzNtMPHZ8JMZmOfEiPx5CGFiWT5iUW7FbLkROHm7GFtbwMoH
qm3Lt7PbdSzJqTuOTvaGCw1xWkjDXMLdsdFM7mx3JO5zT9a/fCqjjmyR2Kl0qcSP
RzfruVe20wUka2BeaXfZzSasgfLswratkU4xsiNiwA37yQaldzhrg8fg6uP3OSYi
dkWFXi6WdWwQzARnjWNPwigUwA3xVaYgV+I6+ME0DYsUBvywZzUg3pkowhRAHyA9
kv86L5Zt5K7kQcVqyd+lIvIuAcGrOZ9hA18PIXnwahLBqmjcqAJoF9XhTTZDMD4J
LfujGMrf7DSDcf0vH8G9wlQQthsPGUOoFia5rr8MhdVVNee/b1Qvwsh7kmyg0DOK
9WuNQxGPd7s+X+kwdmGrK7E6fqyPwEfC43l8wtCiBIyGz6QcorwD7kH9DGzv5xGF
NX7WQeKvcaoXn1XVfonb0YgdVOnbyqK4AiY3Po1Ood3IxGyiLGCgDnusvYu+C9/r
T0rRMbljkX1lUKqfzGkg2egOKPR+8RFgFKrKNSXUkDxl8TFLRd3ZObowPPlohq1I
9lIIirip5UFYRv+7srDU1oZPWkvwkpJmaMFgagD3w+OWdo6zwao=
=wsFu
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs cleanups from Darrick Wong:
"The most 'exciting' aspect of this branch is that the xfsprogs
maintainer and I have worked through the last of the code
discrepancies between kernel and userspace libxfs such that there are
no code differences between the two except for #includes.
IOWs, diff suffices to demonstrate that the userspace tools behave the
same as the kernel, and kernel-only bits are clearly marked in the
/kernel/ source code instead of just the userspace source.
Summary:
- Clean up open-coded swap() calls.
- A little bit of #ifdef golf to complete the reunification of the
kernel and userspace libxfs source code"
* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: sync xfs_btree_split macros with userspace libxfs
xfs: #ifdef out perag code for userspace
xfs: use swap() to make dabtree code cleaner
Fix a build error in stracktrace.c, fix resolving of addresses to
function names in backtraces, fix single-stepping in assembly code
and flush userspace pte's when using set_pte_at().
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYZAyIAAKCRD3ErUQojoP
X8zUAQCV4fijXlUEOnZorH42QsSvs1SowHXu2YdHU8CmauWVawEAt14Zj3fmMBcS
+uSacieZROU9maQtGgJAYHZVJwgPyQY=
=OQRw
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull more parisc fixes from Helge Deller:
"Fix a build error in stracktrace.c, fix resolving of addresses to
function names in backtraces, fix single-stepping in assembly code and
flush userspace pte's when using set_pte_at()"
* tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/entry: fix trace test in syscall exit path
parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page
parisc: Fix implicit declaration of function '__kernel_text_address'
parisc: Fix backtrace to always include init funtion names
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJhkUvoAAoJELcQ+SIFb8HaQG8H/1pF/zhT+xYAK7dMOYfdyl9R
UGgnNGR2kjPDhsAkMHIYWVopP2KFV4kA4TCe8UcEt4GPeWZgGjV8lIaHJa/kdUsH
5tyRYtC8ZtVhJMVJQ8jZYZcTwOqYFFGtCksPq6/j0dHE0Gf+nQStSXNZqi8/XxE9
c3jDg+l7PVwdl7LWFedPVXcflhD5JLrRhyA/17V0INWrqP/MboXGRyq/Yluksw7x
C89ZXrgRE+5tN9icfRT4pl/XBpezWQkjuvsyHM9DWOjqICXfng8yeB5MZ/+R2uyF
qbXRSGqICOFhRRV9m/L0kypCYYiaZ8bBiYvLeGNggmpUJVMosIDFpS7kiYQavuA=
=dnw0
-----END PGP SIGNATURE-----
Merge tag 'sh-for-5.16' of git://git.libc.org/linux-sh
Pull arch/sh updates from Rich Felker.
* tag 'sh-for-5.16' of git://git.libc.org/linux-sh:
sh: pgtable-3level: Fix cast to pointer from integer of different size
sh: fix READ/WRITE redefinition warnings
sh: define __BIG_ENDIAN for math-emu
sh: math-emu: drop unused functions
sh: fix kconfig unmet dependency warning for FRAME_POINTER
sh: Cleanup about SPARSE_IRQ
sh: kdump: add some attribute to function
maple: fix wrong return value of maple_bus_init().
sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/
sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y
sh: boards: Fix the cacography in irq.c
sh: check return code of request_irq
sh: fix trivial misannotations
- Fix early_iounmap
- Drop cc-option fallbacks for architecture selection
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmGRL48ACgkQ9OeQG+St
rGQyXA//ZuMjsA615TncG5686glPds6y+SoiWAPosz9ElGawL5Ke9vybIBPxYrZj
gSaR4aiyoCy4B22tSQcAuyd8Ai8+cX5jUv7fbgZcCWWQto0OiD3S1jpaGwaPN+3I
1fwbUvUjrdzzDfGHJbFwBYykkQ8bPWrnjtZ9EK8g9t7QuetqkZrihw/gfqHeflnu
Ad9WqEiHWDguOU9kdT6uGZf5RpfjwoLuxYNMSQVktQw81+Lk0GGfEroR2ZqUZAf4
YIrhaKTxjXfkm4Hv6bfLtSYxWY0sw1ziO0qhtEJHKQyE5hbMOy5Dhdq9Ntefkin1
/+OaNZzLiKuv6ZqWJjg5ViXc0pVgDNfLKbj8aicGW3966F13OISq9MIy9pkivK+2
R7ROhcdkJ4Tz6sIbAZBxmyFbgY+sjZ0F2OoOkjcLUCfyMGqfl8FohNd6EvSs0yP8
6zfnUOg4vBkLg80d/lg6RjSszqOSdFjW4Qbi8HQoDYk9bemLPtVxQnjYsd2RMZhM
oKTkNTcA17MUCYncw6xu2lL3ahz15XMxBf/L2mfFwFCIplKvpTIYbP5BWSisjiyz
JhbjLj9PaCOmubu+n2JKK4dm0GeEngWhu5CW0xN0eQ0at8CJa9dmt5y3V1V8//jj
F3Xltms1d67nHr+6pkE+EYp4YMTMBO1L7opsktTs1vCIJ2trTWM=
=Ksk/
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- Fix early_iounmap
- Drop cc-option fallbacks for architecture selection
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9156/1: drop cc-option fallbacks for architecture selection
ARM: 9155/1: fix early early_iounmap()
- 2 fixes due to DT node name changes on Arm, Ltd. boards
- Treewide rename of Ingenic CGU headers
- Update ST email addresses
- Remove Netlogic DT bindings
- Dropping few more cases of redundant 'maxItems' in schemas
- Convert toshiba,tc358767 bridge binding to schema
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmGRN/wQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw0EXD/0byDq/gx0BOgSY18wOWp0W3tnJiudRrKNk
8+TqfphpridGSzZryboBPQ3U06eOT1pMV9EBzHfqeb87nIneMxZ26KhgMPRFf7qt
87G2tyfq4Itd59HATC/8xsq/5uiCONksbPojhjn6SrI4wLBzTegIYG5ZiocQw2tP
vcW9wDAOfcRVfXBtQwmYD7nkeShaoTrv+EBcAFhg4XB43TWyezCCBqvAz0qDSl7q
N/9xomgsB/fG1ImL/UWjPuPix64I9mrop7d8+C17980RJM10e5wL/eDWpbvWxqy6
R7nTHT9HWnbcV5ywgBTh3W6iQw9EGyytD0Uzw3v+Meoga1/zCBkF48mEin61nY1c
i44os73B9mwSZo34qysrFkir+NBNRBAwdP7z+GrarB9twlcIdfjfInQYWlNP73Yb
zJ/XIVrfn6GsEZRdgXXK5VHL0ffDYzwoGEHSU6m0cTJI+iNQT4WQ89HkhMHG1q0d
pRwKCE75mhMrwvniBYinL8I8LuMAJPZZKDpc6ZtBJdUdF6MFgXccJFdIBoho1TVL
oHkaXWwUjkGXliTdWX7UJk4zS4zNAyB1jLT9uHMrLvX7uVM1Dsy9bYWaU4ABB0xr
viT/gj/nawCbGbniytZEfAAe1bsnZOLqxmqe3XGUgYuOPQLB2xSSjkrr7HYmuLmx
1AskvHxO0w==
=ukBC
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Two fixes due to DT node name changes on Arm, Ltd. boards
- Treewide rename of Ingenic CGU headers
- Update ST email addresses
- Remove Netlogic DT bindings
- Dropping few more cases of redundant 'maxItems' in schemas
- Convert toshiba,tc358767 bridge binding to schema
* tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: watchdog: sunxi: fix error in schema
bindings: media: venus: Drop redundant maxItems for power-domain-names
dt-bindings: Remove Netlogic bindings
clk: versatile: clk-icst: Ensure clock names are unique
of: Support using 'mask' in making device bus id
dt-bindings: treewide: Update @st.com email address to @foss.st.com
dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml
dt-bindings: media: Update maintainers for st,stm32-cec.yaml
dt-bindings: mfd: timers: Update maintainers for st,stm32-timers
dt-bindings: timer: Update maintainers for st,stm32-timer
dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml
dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
timer delivery stops working for a new child task because copy_process()
copies state information which is only valid for the parent task.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDVUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocOFD/42NOdli73N+Jdq7APHUIHXzu+6DVT6
CI5toLQw+0KPoF0s1wg4+J0YCDt2k0Pu4lOabF3Ze/+c6RlR5zfCXESqsXdHaCjh
E91Vs57u0ataRMEHo6KB6eBIutuF8hyxfY6vVXfkTRNAreUIWiwWYrlB0G64JVOG
+/l1W7adovjLcLwcW+ArrnLJwkBKtXunK6PVv2IrdRHwpMHbwoNRCCCFvzkqnWmQ
4Yy2/NaB/PEBK5kezP1/j9EMcGCTWk1JJIm+l/PEwCCcbIgIdUahpW3XHAaqms6R
oukqCvE5ukfmVzBFYBhCamhF8heyEeBVRqGU+Yyk48+I+DQFBCqaqa1NKSuEUdNL
Nycy6Rp1yn7CHVSB461shMS6NJGOSNDBjv7vxer3WjV3HPJu7y0RrN7jXbkSfQnm
hVKjkmbDEYwylgzFE5+T857NqD5MEXeuIBtTO08hNRnpd61aB3x+qq+8ElE6ST8Y
pm6rMzw0AZ5buPK8QdGVDk0dD4WKObj1LzmRZvBtYeWynO6sxyKUl6B2CgAxrvn5
D1Li2/arkJMCVeIuIL5uE6DPoxSh8J7OuEC4KeWX8M8xQSEDImqfZ+tDL2Esv6jv
xDmymq584hiCBc1CJjCOA9kZYe6KNXC7lkVOns6GaKKzLhkrcvUR3dUGhMyzxAMO
t9QIAinR6JwRRA==
=EBbc
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because
copy_process() copies state information which is only valid for the
parent task"
* tag 'timers-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
- Core code:
A regression fix for the Open Firmware interrupt mapping code where a
interrupt controller property in a node caused a map property in the
same node to be ignored.
- Interrupt chip drivers:
- Workaround a limitation in SiFive PLIC interrupt chip which silently
ignores an EOI when the interrupt line is masked.
- Provide the missing mask/unmask implementation for the CSKY MP
interrupt controller.
- PCI/MSI:
- Prevent a use after free when PCI/MSI interrupts are released by
destroying the sysfs entries before freeing the memory which is
accessed in the sysfs show() function.
- Implement a mask quirk for the Nvidia ION AHCI chip which does not
advertise masking capability despite implementing it. Even worse the
chip comes out of reset with all MSI entries masked, which due to the
missing masking capability never get unmasked.
- Move the check which prevents accessing the MSI[X] masking for XEN
back into the low level accessors. The recent consolidation missed
that these accessors can be invoked from places which do not have
that check which broke XEN. Move them back to he original place
instead of sprinkling tons of these checks all over the code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGRDCsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTL5D/4n7CUudohHPckr0Rl3LbnSUfyY9g3H
irTKur71AT392YerJtQp+WBp3AKYMDD8wPTgydfpWe95ouIjx5jhb/co7uSifG6k
ZssXYS10bkvjqyS8E2s5FnA5xbnagunK/R981qju14Ec39xqx1JzlUnO/Pra0Kcr
5rBV7br9jJMBleBI4OFuS9fS8dVL1MH/yushkuDNfIKEnaElnaxaYUk/ZdzkMMAW
lt1B+dPhK24t1hXQvZKp/iVQUGrJWdzzy9aDiUYPv1IZP+V5nbLMgmFvEv8jNdNa
6kkfp0l30nXM9rgvcp2KkasVUPVhurVEwitzz9+tT6LRA+/kSwi2yx8/FwCVUcL6
xD0AgKQgxOj/WwGJTZswvPu3afsLuw3rGmx5uH1IV40P9mPX0AiHWgvoaInHjzlJ
QKFQ7mJEuUcC6cJ36RGqX9njhKvPIcUENGCTjGSffcXsWltPrOCg2mQFcsDa9fSH
qPfXDVv4YINI+0MAlOULh6TLWQ07xy37HiskJu/AgILOfipoDi8pXdqNJRfvxB1S
D3O8vB+SH3lPj69w4dtj7539SdNZn8CCyN3RbNlstl2vHV5Bus3cVk0CcOhG8qNW
KwK/tSH8O0ZYHAsUu8OqBipXy6qOPi/10MJQn3NOpvvOmS4oDd+82bq+jp5qJpsG
42WNuzEoBdaUiA==
=LBQL
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem
Core code:
- A regression fix for the Open Firmware interrupt mapping code where
a interrupt controller property in a node caused a map property in
the same node to be ignored.
Interrupt chip drivers:
- Workaround a limitation in SiFive PLIC interrupt chip which
silently ignores an EOI when the interrupt line is masked.
- Provide the missing mask/unmask implementation for the CSKY MP
interrupt controller.
PCI/MSI:
- Prevent a use after free when PCI/MSI interrupts are released by
destroying the sysfs entries before freeing the memory which is
accessed in the sysfs show() function.
- Implement a mask quirk for the Nvidia ION AHCI chip which does not
advertise masking capability despite implementing it. Even worse
the chip comes out of reset with all MSI entries masked, which due
to the missing masking capability never get unmasked.
- Move the check which prevents accessing the MSI[X] masking for XEN
back into the low level accessors. The recent consolidation missed
that these accessors can be invoked from places which do not have
that check which broke XEN. Move them back to he original place
instead of sprinkling tons of these checks all over the code"
* tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
of/irq: Don't ignore interrupt-controller when interrupt-map failed
irqchip/sifive-plic: Fixup EOI failed when masked
irqchip/csky-mpintc: Fixup mask/unmask implementation
PCI/MSI: Destroy sysfs before freeing entries
PCI: Add MSI masking quirk for Nvidia ION AHCI
PCI/MSI: Deal with devices lying about their MSI mask capability
PCI/MSI: Move non-mask check back into low level accessors