linux-stable/drivers/scsi/hosts.c

727 lines
18 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* hosts.c Copyright (C) 1992 Drew Eckhardt
* Copyright (C) 1993, 1994, 1995 Eric Youngdale
* Copyright (C) 2002-2003 Christoph Hellwig
*
* mid to lowlevel SCSI driver interface
* Initial versions: Drew Eckhardt
* Subsequent revisions: Eric Youngdale
*
* <drew@colorado.edu>
*
* Jiffies wrap fixes (host->resetting), 3 Dec 1998 Andrea Arcangeli
* Added QLOGIC QLA1280 SCSI controller kernel host support.
* August 4, 1999 Fred Lewis, Intel DuPont
*
* Updated to reflect the new initialization scheme for the higher
* level of scsi drivers (sd/sr/st)
* September 17, 2000 Torben Mathiasen <tmm@image.dk>
*
* Restructured scsi_host lists and associated functions.
* September 04, 2002 Mike Anderson (andmike@us.ibm.com)
*/
#include <linux/module.h>
#include <linux/blkdev.h>
#include <linux/kernel.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 08:04:11 +00:00
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/init.h>
#include <linux/completion.h>
#include <linux/transport_class.h>
#include <linux/platform_device.h>
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 14:41:42 +00:00
#include <linux/pm_runtime.h>
#include <linux/idr.h>
#include <scsi/scsi_device.h>
#include <scsi/scsi_host.h>
#include <scsi/scsi_transport.h>
#include <scsi/scsi_cmnd.h>
#include "scsi_priv.h"
#include "scsi_logging.h"
static int shost_eh_deadline = -1;
module_param_named(eh_deadline, shost_eh_deadline, int, S_IRUGO|S_IWUSR);
MODULE_PARM_DESC(eh_deadline,
"SCSI EH timeout in seconds (should be between 0 and 2^31-1)");
static DEFINE_IDA(host_index_ida);
static void scsi_host_cls_release(struct device *dev)
{
put_device(&class_to_shost(dev)->shost_gendev);
}
static struct class shost_class = {
.name = "scsi_host",
.dev_release = scsi_host_cls_release,
};
/**
* scsi_host_set_state - Take the given host through the host state model.
* @shost: scsi host to change the state of.
* @state: state to change to.
*
* Returns zero if unsuccessful or an error if the requested
* transition is illegal.
**/
int scsi_host_set_state(struct Scsi_Host *shost, enum scsi_host_state state)
{
enum scsi_host_state oldstate = shost->shost_state;
if (state == oldstate)
return 0;
switch (state) {
case SHOST_CREATED:
/* There are no legal states that come back to
* created. This is the manually initialised start
* state */
goto illegal;
case SHOST_RUNNING:
switch (oldstate) {
case SHOST_CREATED:
case SHOST_RECOVERY:
break;
default:
goto illegal;
}
break;
case SHOST_RECOVERY:
switch (oldstate) {
case SHOST_RUNNING:
break;
default:
goto illegal;
}
break;
case SHOST_CANCEL:
switch (oldstate) {
case SHOST_CREATED:
case SHOST_RUNNING:
case SHOST_CANCEL_RECOVERY:
break;
default:
goto illegal;
}
break;
case SHOST_DEL:
switch (oldstate) {
case SHOST_CANCEL:
case SHOST_DEL_RECOVERY:
break;
default:
goto illegal;
}
break;
case SHOST_CANCEL_RECOVERY:
switch (oldstate) {
case SHOST_CANCEL:
case SHOST_RECOVERY:
break;
default:
goto illegal;
}
break;
case SHOST_DEL_RECOVERY:
switch (oldstate) {
case SHOST_CANCEL_RECOVERY:
break;
default:
goto illegal;
}
break;
}
shost->shost_state = state;
return 0;
illegal:
SCSI_LOG_ERROR_RECOVERY(1,
shost_printk(KERN_ERR, shost,
"Illegal host state transition"
"%s->%s\n",
scsi_host_state_name(oldstate),
scsi_host_state_name(state)));
return -EINVAL;
}
/**
* scsi_remove_host - remove a scsi host
* @shost: a pointer to a scsi host to remove
**/
void scsi_remove_host(struct Scsi_Host *shost)
{
unsigned long flags;
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 14:41:42 +00:00
mutex_lock(&shost->scan_mutex);
spin_lock_irqsave(shost->host_lock, flags);
if (scsi_host_set_state(shost, SHOST_CANCEL))
if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
spin_unlock_irqrestore(shost->host_lock, flags);
mutex_unlock(&shost->scan_mutex);
return;
}
spin_unlock_irqrestore(shost->host_lock, flags);
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 14:41:42 +00:00
scsi_autopm_get_host(shost);
flush_workqueue(shost->tmf_work_q);
scsi_forget_host(shost);
mutex_unlock(&shost->scan_mutex);
scsi_proc_host_rm(shost);
spin_lock_irqsave(shost->host_lock, flags);
if (scsi_host_set_state(shost, SHOST_DEL))
BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
spin_unlock_irqrestore(shost->host_lock, flags);
transport_unregister_device(&shost->shost_gendev);
device_unregister(&shost->shost_dev);
device_del(&shost->shost_gendev);
}
EXPORT_SYMBOL(scsi_remove_host);
/**
* scsi_add_host_with_dma - add a scsi host with dma device
* @shost: scsi host pointer to add
* @dev: a struct device of type scsi class
* @dma_dev: dma device for the host
*
* Note: You rarely need to worry about this unless you're in a
* virtualised host environments, so use the simpler scsi_add_host()
* function instead.
*
* Return value:
* 0 on success / != 0 for error
**/
int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
struct device *dma_dev)
{
struct scsi_host_template *sht = shost->hostt;
int error = -EINVAL;
shost_printk(KERN_INFO, shost, "%s\n",
sht->info ? sht->info(shost) : sht->name);
if (!shost->can_queue) {
shost_printk(KERN_ERR, shost,
"can_queue = 0 no longer supported\n");
goto fail;
}
shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,
shost->can_queue);
error = scsi_init_sense_cache(shost);
if (error)
goto fail;
error = scsi_mq_setup_tags(shost);
if (error)
goto fail;
scsi: add support for a blk-mq based I/O path. This patch adds support for an alternate I/O path in the scsi midlayer which uses the blk-mq infrastructure instead of the legacy request code. Use of blk-mq is fully transparent to drivers, although for now a host template field is provided to opt out of blk-mq usage in case any unforseen incompatibilities arise. In general replacing the legacy request code with blk-mq is a simple and mostly mechanical transformation. The biggest exception is the new code that deals with the fact the I/O submissions in blk-mq must happen from process context, which slightly complicates the I/O completion handler. The second biggest differences is that blk-mq is build around the concept of preallocated requests that also include driver specific data, which in SCSI context means the scsi_cmnd structure. This completely avoids dynamic memory allocations for the fast path through I/O submission. Due the preallocated requests the MQ code path exclusively uses the host-wide shared tag allocator instead of a per-LUN one. This only affects drivers actually using the block layer provided tag allocator instead of their own. Unlike the old path blk-mq always provides a tag, although drivers don't have to use it. For now the blk-mq path is disable by defauly and must be enabled using the "use_blk_mq" module parameter. Once the remaining work in the block layer to make blk-mq more suitable for slow devices is complete I hope to make it the default and eventually even remove the old code path. Based on the earlier scsi-mq prototype by Nicholas Bellinger. Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and various sugestions and code contributions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
2014-01-17 11:06:53 +00:00
if (!shost->shost_gendev.parent)
shost->shost_gendev.parent = dev ? dev : &platform_bus;
if (!dma_dev)
dma_dev = shost->shost_gendev.parent;
shost->dma_dev = dma_dev;
/*
* Increase usage count temporarily here so that calling
* scsi_autopm_put_host() will trigger runtime idle if there is
* nothing else preventing suspending the device.
*/
pm_runtime_get_noresume(&shost->shost_gendev);
[SCSI] implement runtime Power Management This patch (as1398b) adds runtime PM support to the SCSI layer. Only the machanism is provided; use of it is up to the various high-level drivers, and the patch doesn't change any of them. Except for sg -- the patch expicitly prevents a device from being runtime-suspended while its sg device file is open. The implementation is simplistic. In general, hosts and targets are automatically suspended when all their children are asleep, but for them the runtime-suspend code doesn't actually do anything. (A host's runtime PM status is propagated up the device tree, though, so a runtime-PM-aware lower-level driver could power down the host adapter hardware at the appropriate times.) There are comments indicating where a transport class might be notified or some other hooks added. LUNs are runtime-suspended by calling the drivers' existing suspend handlers (and likewise for runtime-resume). Somewhat arbitrarily, the implementation delays for 100 ms before suspending an eligible LUN. This is because there typically are occasions during bootup when the same device file is opened and closed several times in quick succession. The way this all works is that the SCSI core increments a device's PM-usage count when it is registered. If a high-level driver does nothing then the device will not be eligible for runtime-suspend because of the elevated usage count. If a high-level driver wants to use runtime PM then it can call scsi_autopm_put_device() in its probe routine to decrement the usage count and scsi_autopm_get_device() in its remove routine to restore the original count. Hosts, targets, and LUNs are not suspended while they are being probed or removed, or while the error handler is running. In fact, a fairly large part of the patch consists of code to make sure that things aren't suspended at such times. [jejb: fix up compile issues in PM config variations] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-06-17 14:41:42 +00:00
pm_runtime_set_active(&shost->shost_gendev);
pm_runtime_enable(&shost->shost_gendev);
device_enable_async_suspend(&shost->shost_gendev);
error = device_add(&shost->shost_gendev);
if (error)
goto out_disable_runtime_pm;
scsi_host_set_state(shost, SHOST_RUNNING);
get_device(shost->shost_gendev.parent);
device_enable_async_suspend(&shost->shost_dev);
get_device(&shost->shost_gendev);
error = device_add(&shost->shost_dev);
if (error)
goto out_del_gendev;
if (shost->transportt->host_size) {
shost->shost_data = kzalloc(shost->transportt->host_size,
GFP_KERNEL);
if (shost->shost_data == NULL) {
error = -ENOMEM;
goto out_del_dev;
}
}
if (shost->transportt->create_work_queue) {
snprintf(shost->work_q_name, sizeof(shost->work_q_name),
"scsi_wq_%d", shost->host_no);
shost->work_q = alloc_workqueue("%s",
WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND,
1, shost->work_q_name);
if (!shost->work_q) {
error = -EINVAL;
goto out_del_dev;
}
}
error = scsi_sysfs_add_host(shost);
if (error)
goto out_del_dev;
scsi_proc_host_add(shost);
scsi_autopm_put_host(shost);
return error;
/*
* Any host allocation in this function will be freed in
* scsi_host_dev_release().
*/
out_del_dev:
device_del(&shost->shost_dev);
out_del_gendev:
/*
* Host state is SHOST_RUNNING so we have to explicitly release
* ->shost_dev.
*/
put_device(&shost->shost_dev);
device_del(&shost->shost_gendev);
out_disable_runtime_pm:
device_disable_async_suspend(&shost->shost_gendev);
pm_runtime_disable(&shost->shost_gendev);
pm_runtime_set_suspended(&shost->shost_gendev);
pm_runtime_put_noidle(&shost->shost_gendev);
fail:
return error;
}
EXPORT_SYMBOL(scsi_add_host_with_dma);
static void scsi_host_dev_release(struct device *dev)
{
struct Scsi_Host *shost = dev_to_shost(dev);
struct device *parent = dev->parent;
scsi_proc_hostdir_rm(shost->hostt);
/* Wait for functions invoked through call_rcu(&scmd->rcu, ...) */
rcu_barrier();
if (shost->tmf_work_q)
destroy_workqueue(shost->tmf_work_q);
if (shost->ehandler)
kthread_stop(shost->ehandler);
if (shost->work_q)
destroy_workqueue(shost->work_q);
Fix a memory leak in scsi_host_dev_release() Avoid that kmemleak reports the following memory leak if a SCSI LLD calls scsi_host_alloc() and scsi_host_put() but neither scsi_host_add() nor scsi_host_remove(). The following shell command triggers that scenario: for ((i=0; i<2; i++)); do srp_daemon -oac | while read line; do echo $line >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target done done unreferenced object 0xffff88021b24a220 (size 8): comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s) hex dump (first 8 bytes): 68 6f 73 74 35 38 00 a5 host58.. backtrace: [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0 [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160 [<ffffffff81260d2b>] kvasprintf+0x5b/0x90 [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0 [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0 [<ffffffff81337e3c>] dev_set_name+0x3c/0x40 [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0 [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp] [<ffffffff8133778b>] dev_attr_store+0x1b/0x20 [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60 [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180 [<ffffffff81176eef>] __vfs_write+0x2f/0xf0 [<ffffffff811771e4>] vfs_write+0xa4/0x100 [<ffffffff81177c64>] SyS_write+0x54/0xc0 [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: stable <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-18 22:56:36 +00:00
if (shost->shost_state == SHOST_CREATED) {
/*
* Free the shost_dev device name here if scsi_host_alloc()
* and scsi_host_put() have been called but neither
* scsi_host_add() nor scsi_host_remove() has been called.
* This avoids that the memory allocated for the shost_dev
* name is leaked.
*/
kfree(dev_name(&shost->shost_dev));
}
if (shost->tag_set.tags)
scsi_mq_destroy_tags(shost);
kfree(shost->shost_data);
ida_simple_remove(&host_index_ida, shost->host_no);
if (shost->shost_state != SHOST_CREATED)
put_device(parent);
kfree(shost);
}
static struct device_type scsi_host_type = {
.name = "scsi_host",
.release = scsi_host_dev_release,
};
/**
* scsi_host_alloc - register a scsi host adapter instance.
* @sht: pointer to scsi host template
* @privsize: extra bytes to allocate for driver
*
* Note:
* Allocate a new Scsi_Host and perform basic initialization.
* The host is not published to the scsi midlayer until scsi_add_host
* is called.
*
* Return value:
* Pointer to a new Scsi_Host
**/
struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
{
struct Scsi_Host *shost;
int index;
shost = kzalloc(sizeof(struct Scsi_Host) + privsize, GFP_KERNEL);
if (!shost)
return NULL;
shost->host_lock = &shost->default_lock;
spin_lock_init(shost->host_lock);
shost->shost_state = SHOST_CREATED;
INIT_LIST_HEAD(&shost->__devices);
INIT_LIST_HEAD(&shost->__targets);
INIT_LIST_HEAD(&shost->eh_cmd_q);
INIT_LIST_HEAD(&shost->starved_list);
init_waitqueue_head(&shost->host_wait);
mutex_init(&shost->scan_mutex);
index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL);
if (index < 0) {
kfree(shost);
return NULL;
}
shost->host_no = index;
shost->dma_channel = 0xff;
/* These three are default values which can be overridden */
shost->max_channel = 0;
shost->max_id = 8;
shost->max_lun = 8;
/* Give each shost a default transportt */
shost->transportt = &blank_transport_template;
/*
* All drivers right now should be able to handle 12 byte
* commands. Every so often there are requests for 16 byte
* commands, but individual low-level drivers need to certify that
* they actually do something sensible with such commands.
*/
shost->max_cmd_len = 12;
shost->hostt = sht;
shost->this_id = sht->this_id;
shost->can_queue = sht->can_queue;
shost->sg_tablesize = sht->sg_tablesize;
shost->sg_prot_tablesize = sht->sg_prot_tablesize;
shost->cmd_per_lun = sht->cmd_per_lun;
shost->no_write_same = sht->no_write_same;
shost->host_tagset = sht->host_tagset;
if (shost_eh_deadline == -1 || !sht->eh_host_reset_handler)
shost->eh_deadline = -1;
else if ((ulong) shost_eh_deadline * HZ > INT_MAX) {
shost_printk(KERN_WARNING, shost,
"eh_deadline %u too large, setting to %u\n",
shost_eh_deadline, INT_MAX / HZ);
shost->eh_deadline = INT_MAX;
} else
shost->eh_deadline = shost_eh_deadline * HZ;
if (sht->supported_mode == MODE_UNKNOWN)
/* means we didn't set it ... default to INITIATOR */
shost->active_mode = MODE_INITIATOR;
else
shost->active_mode = sht->supported_mode;
if (sht->max_host_blocked)
shost->max_host_blocked = sht->max_host_blocked;
else
shost->max_host_blocked = SCSI_DEFAULT_HOST_BLOCKED;
/*
* If the driver imposes no hard sector transfer limit, start at
* machine infinity initially.
*/
if (sht->max_sectors)
shost->max_sectors = sht->max_sectors;
else
shost->max_sectors = SCSI_DEFAULT_MAX_SECTORS;
if (sht->max_segment_size)
shost->max_segment_size = sht->max_segment_size;
else
shost->max_segment_size = BLK_MAX_SEGMENT_SIZE;
/*
* assume a 4GB boundary, if not set
*/
if (sht->dma_boundary)
shost->dma_boundary = sht->dma_boundary;
else
shost->dma_boundary = 0xffffffff;
if (sht->virt_boundary_mask)
shost->virt_boundary_mask = sht->virt_boundary_mask;
device_initialize(&shost->shost_gendev);
dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
shost->shost_gendev.bus = &scsi_bus_type;
shost->shost_gendev.type = &scsi_host_type;
device_initialize(&shost->shost_dev);
shost->shost_dev.parent = &shost->shost_gendev;
shost->shost_dev.class = &shost_class;
dev_set_name(&shost->shost_dev, "host%d", shost->host_no);
shost->shost_dev.groups = scsi_sysfs_shost_attr_groups;
shost->ehandler = kthread_run(scsi_error_handler, shost,
"scsi_eh_%d", shost->host_no);
if (IS_ERR(shost->ehandler)) {
shost_printk(KERN_WARNING, shost,
"error handler thread failed to spawn, error = %ld\n",
PTR_ERR(shost->ehandler));
scsi: core: Fix bad pointer dereference when ehandler kthread is invalid Commit 66a834d09293 ("scsi: core: Fix error handling of scsi_host_alloc()") changed the allocation logic to call put_device() to perform host cleanup with the assumption that IDA removal and stopping the kthread would properly be performed in scsi_host_dev_release(). However, in the unlikely case that the error handler thread fails to spawn, shost->ehandler is set to ERR_PTR(-ENOMEM). The error handler cleanup code in scsi_host_dev_release() will call kthread_stop() if shost->ehandler != NULL which will always be the case whether the kthread was successfully spawned or not. In the case that it failed to spawn this has the nasty side effect of trying to dereference an invalid pointer when kthread_stop() is called. The following splat provides an example of this behavior in the wild: scsi host11: error handler thread failed to spawn, error = -4 Kernel attempted to read user page (10c) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x0000010c Faulting instruction address: 0xc00000000818e9a8 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvscsi(+) scsi_transport_srp dm_multipath dm_mirror dm_region hash dm_log dm_mod fuse overlay squashfs loop CPU: 12 PID: 274 Comm: systemd-udevd Not tainted 5.13.0-rc7 #1 NIP: c00000000818e9a8 LR: c0000000089846e8 CTR: 0000000000007ee8 REGS: c000000037d12ea0 TRAP: 0300 Not tainted (5.13.0-rc7) MSR: 800000000280b033 &lt;SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE&gt; CR: 28228228 XER: 20040001 CFAR: c0000000089846e4 DAR: 000000000000010c DSISR: 40000000 IRQMASK: 0 GPR00: c0000000089846e8 c000000037d13140 c000000009cc1100 fffffffffffffffc GPR04: 0000000000000001 0000000000000000 0000000000000000 c000000037dc0000 GPR08: 0000000000000000 c000000037dc0000 0000000000000001 00000000fffff7ff GPR12: 0000000000008000 c00000000a049000 c000000037d13d00 000000011134d5a0 GPR16: 0000000000001740 c0080000190d0000 c0080000190d1740 c000000009129288 GPR20: c000000037d13bc0 0000000000000001 c000000037d13bc0 c0080000190b7898 GPR24: c0080000190b7708 0000000000000000 c000000033bb2c48 0000000000000000 GPR28: c000000046b28280 0000000000000000 000000000000010c fffffffffffffffc NIP [c00000000818e9a8] kthread_stop+0x38/0x230 LR [c0000000089846e8] scsi_host_dev_release+0x98/0x160 Call Trace: [c000000033bb2c48] 0xc000000033bb2c48 (unreliable) [c0000000089846e8] scsi_host_dev_release+0x98/0x160 [c00000000891e960] device_release+0x60/0x100 [c0000000087e55c4] kobject_release+0x84/0x210 [c00000000891ec78] put_device+0x28/0x40 [c000000008984ea4] scsi_host_alloc+0x314/0x430 [c0080000190b38bc] ibmvscsi_probe+0x54/0xad0 [ibmvscsi] [c000000008110104] vio_bus_probe+0xa4/0x4b0 [c00000000892a860] really_probe+0x140/0x680 [c00000000892aefc] driver_probe_device+0x15c/0x200 [c00000000892b63c] device_driver_attach+0xcc/0xe0 [c00000000892b740] __driver_attach+0xf0/0x200 [c000000008926f28] bus_for_each_dev+0xa8/0x130 [c000000008929ce4] driver_attach+0x34/0x50 [c000000008928fc0] bus_add_driver+0x1b0/0x300 [c00000000892c798] driver_register+0x98/0x1a0 [c00000000810eb60] __vio_register_driver+0x80/0xe0 [c0080000190b4a30] ibmvscsi_module_init+0x9c/0xdc [ibmvscsi] [c0000000080121d0] do_one_initcall+0x60/0x2d0 [c000000008261abc] do_init_module+0x7c/0x320 [c000000008265700] load_module+0x2350/0x25b0 [c000000008265cb4] __do_sys_finit_module+0xd4/0x160 [c000000008031110] system_call_exception+0x150/0x2d0 [c00000000800d35c] system_call_common+0xec/0x278 Fix this be nulling shost->ehandler when the kthread fails to spawn. Link: https://lore.kernel.org/r/20210701195659.3185475-1-tyreld@linux.ibm.com Fixes: 66a834d09293 ("scsi: core: Fix error handling of scsi_host_alloc()") Cc: stable@vger.kernel.org Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-01 19:56:59 +00:00
shost->ehandler = NULL;
goto fail;
}
shost->tmf_work_q = alloc_workqueue("scsi_tmf_%d",
WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS,
1, shost->host_no);
if (!shost->tmf_work_q) {
shost_printk(KERN_WARNING, shost,
"failed to create tmf workq\n");
goto fail;
}
scsi_proc_hostdir_add(shost->hostt);
return shost;
fail:
/*
* Host state is still SHOST_CREATED and that is enough to release
* ->shost_gendev. scsi_host_dev_release() will free
* dev_name(&shost->shost_dev).
*/
put_device(&shost->shost_gendev);
return NULL;
}
EXPORT_SYMBOL(scsi_host_alloc);
static int __scsi_host_match(struct device *dev, const void *data)
{
struct Scsi_Host *p;
const unsigned short *hostnum = data;
p = class_to_shost(dev);
return p->host_no == *hostnum;
}
/**
* scsi_host_lookup - get a reference to a Scsi_Host by host no
* @hostnum: host number to locate
*
* Return value:
* A pointer to located Scsi_Host or NULL.
*
* The caller must do a scsi_host_put() to drop the reference
* that scsi_host_get() took. The put_device() below dropped
* the reference from class_find_device().
**/
struct Scsi_Host *scsi_host_lookup(unsigned short hostnum)
{
struct device *cdev;
struct Scsi_Host *shost = NULL;
cdev = class_find_device(&shost_class, NULL, &hostnum,
__scsi_host_match);
if (cdev) {
shost = scsi_host_get(class_to_shost(cdev));
put_device(cdev);
}
return shost;
}
EXPORT_SYMBOL(scsi_host_lookup);
/**
* scsi_host_get - inc a Scsi_Host ref count
* @shost: Pointer to Scsi_Host to inc.
**/
struct Scsi_Host *scsi_host_get(struct Scsi_Host *shost)
{
if ((shost->shost_state == SHOST_DEL) ||
!get_device(&shost->shost_gendev))
return NULL;
return shost;
}
EXPORT_SYMBOL(scsi_host_get);
static bool scsi_host_check_in_flight(struct request *rq, void *data,
bool reserved)
{
int *count = data;
struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
if (test_bit(SCMD_STATE_INFLIGHT, &cmd->state))
(*count)++;
return true;
}
/**
* scsi_host_busy - Return the host busy counter
* @shost: Pointer to Scsi_Host to inc.
**/
int scsi_host_busy(struct Scsi_Host *shost)
{
int cnt = 0;
blk_mq_tagset_busy_iter(&shost->tag_set,
scsi_host_check_in_flight, &cnt);
return cnt;
}
EXPORT_SYMBOL(scsi_host_busy);
/**
* scsi_host_put - dec a Scsi_Host ref count
* @shost: Pointer to Scsi_Host to dec.
**/
void scsi_host_put(struct Scsi_Host *shost)
{
put_device(&shost->shost_gendev);
}
EXPORT_SYMBOL(scsi_host_put);
int scsi_init_hosts(void)
{
return class_register(&shost_class);
}
void scsi_exit_hosts(void)
{
class_unregister(&shost_class);
ida_destroy(&host_index_ida);
}
int scsi_is_host_device(const struct device *dev)
{
return dev->type == &scsi_host_type;
}
EXPORT_SYMBOL(scsi_is_host_device);
/**
* scsi_queue_work - Queue work to the Scsi_Host workqueue.
* @shost: Pointer to Scsi_Host.
* @work: Work to queue for execution.
*
* Return value:
* 1 - work queued for execution
* 0 - work is already queued
* -EINVAL - work queue doesn't exist
**/
int scsi_queue_work(struct Scsi_Host *shost, struct work_struct *work)
{
if (unlikely(!shost->work_q)) {
shost_printk(KERN_ERR, shost,
"ERROR: Scsi host '%s' attempted to queue scsi-work, "
"when no workqueue created.\n", shost->hostt->name);
dump_stack();
return -EINVAL;
}
return queue_work(shost->work_q, work);
}
EXPORT_SYMBOL_GPL(scsi_queue_work);
/**
* scsi_flush_work - Flush a Scsi_Host's workqueue.
* @shost: Pointer to Scsi_Host.
**/
void scsi_flush_work(struct Scsi_Host *shost)
{
if (!shost->work_q) {
shost_printk(KERN_ERR, shost,
"ERROR: Scsi host '%s' attempted to flush scsi-work, "
"when no workqueue created.\n", shost->hostt->name);
dump_stack();
return;
}
flush_workqueue(shost->work_q);
}
EXPORT_SYMBOL_GPL(scsi_flush_work);
static bool complete_all_cmds_iter(struct request *rq, void *data, bool rsvd)
{
struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq);
enum scsi_host_status status = *(enum scsi_host_status *)data;
scsi_dma_unmap(scmd);
scmd->result = 0;
set_host_byte(scmd, status);
scmd->scsi_done(scmd);
return true;
}
/**
* scsi_host_complete_all_commands - Terminate all running commands
* @shost: Scsi Host on which commands should be terminated
* @status: Status to be set for the terminated commands
*
* There is no protection against modification of the number
* of outstanding commands. It is the responsibility of the
* caller to ensure that concurrent I/O submission and/or
* completion is stopped when calling this function.
*/
void scsi_host_complete_all_commands(struct Scsi_Host *shost,
enum scsi_host_status status)
{
blk_mq_tagset_busy_iter(&shost->tag_set, complete_all_cmds_iter,
&status);
}
EXPORT_SYMBOL_GPL(scsi_host_complete_all_commands);
struct scsi_host_busy_iter_data {
bool (*fn)(struct scsi_cmnd *, void *, bool);
void *priv;
};
static bool __scsi_host_busy_iter_fn(struct request *req, void *priv,
bool reserved)
{
struct scsi_host_busy_iter_data *iter_data = priv;
struct scsi_cmnd *sc = blk_mq_rq_to_pdu(req);
return iter_data->fn(sc, iter_data->priv, reserved);
}
/**
* scsi_host_busy_iter - Iterate over all busy commands
* @shost: Pointer to Scsi_Host.
* @fn: Function to call on each busy command
* @priv: Data pointer passed to @fn
*
* If locking against concurrent command completions is required
* ithas to be provided by the caller
**/
void scsi_host_busy_iter(struct Scsi_Host *shost,
bool (*fn)(struct scsi_cmnd *, void *, bool),
void *priv)
{
struct scsi_host_busy_iter_data iter_data = {
.fn = fn,
.priv = priv,
};
blk_mq_tagset_busy_iter(&shost->tag_set, __scsi_host_busy_iter_fn,
&iter_data);
}
EXPORT_SYMBOL_GPL(scsi_host_busy_iter);