Commit graph

318 commits

Author SHA1 Message Date
Dan Williams
35bf88212b libata: end the r-word
Prompted by the social effort in the US to discourage usage of the
adjective "retarded".

In this case we needlessly anthropomorphize hard drives.  The
implication is that due to design deficiencies in the device reset
recovery time is negatively impacted.  We can simply clearly state that
fact.  "Exceptional devices cause outliers in reset recovery time." This
steers clear of any unintended comparison of such devices to humans with
cognitive disabilities.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-07 17:30:06 -05:00
Levente Kurusa
462098b090 ata: libata-eh: Remove unnecessary snprintf arithmetic
Remove an unnecessary arithmetic operation from a call to snprintf, because
the size parameter of snprintf includes the trailing null space.
Also, initialize the buffer on definition instead of a memset call.

Signed-off-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 17:21:10 -05:00
Wolfram Sang
16735d022f tree-wide: use reinit_completion instead of INIT_COMPLETION
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.

[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:21 +09:00
Linus Torvalds
13aa7e0bc2 Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata changes from Tejun Heo:
 "Nothing too interesting.  Only two minor fixes in libata core.  Most
  changes are specific to hardware which isn't too common"

* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci: Add Device IDs for Intel Wildcat Point-LP
  sata_rcar: Convert to clk_prepare/unprepare
  drivers/libata: Set max sector to 65535 for Slimtype DVD A DS8A9SH drive
  libata: Add some missing command descriptions
  sata_highbank: clear whole array in highbank_initialize_phys()
  ahci: disabled FBS prior to issuing software reset
  libata: Fix display of sata speed
  ahci: imx: setup power saving methods
  ata_piix: minor typo and a printk fix
  ahci: Changing two module params with static and __read_mostly
2013-11-13 15:18:22 +09:00
Robert Hancock
3915c3b5b1 libata: Add some missing command descriptions
Add some missing command enumerations from the ATA-8 ACS-3 spec into
include/linux/ata.h, and add the corresponding human-readable command
descriptions in libata-eh.c.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-10-27 08:40:39 -04:00
Gwendal Grignou
f13e220161 libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failures
libata EH decrements scmd->retries when the command failed for reasons
unrelated to the command itself so that, for example, commands aborted
due to suspend / resume cycle don't get penalized; however,
decrementing scmd->retries isn't enough for ATA passthrough commands.

Without this fix, ATA passthrough commands are not resend to the
drive, and no error is signalled to the caller because:

- allowed retry count is 1
- ata_eh_qc_complete fill the sense data, so result is valid
- sense data is filled with untouched ATA registers.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2013-10-07 15:18:25 -04:00
Tejun Heo
8c3d3d4b12 libata: update "Maintained by:" tags
Jeff moved on to a greener pasture.

 s/Maintained by: Jeff Garzik/Maintained by: Tejun Heo/g

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
2013-05-14 11:13:04 -07:00
Aaron Lu
a7ff60dbe0 [libata] pm: differentiate system and runtime pm for ata port
We need to do different things for system PM and runtime PM, e.g. we do
not need to enable runtime wake for ZPODD when we are doing system
suspend, etc.

Currently, we use PMSG_SUSPEND for both system suspend and runtime
suspend and PMSG_ON for both system resume and runtime resume. Change
this by using PMSG_AUTO_SUSPEND for runtime suspend and PMSG_AUTO_RESUME
for runtime resume. And since PMSG_ON means no transition, it is changed
to PMSG_RESUME for ata port's system resume.

The ata_acpi_set_state is modified accordingly, and the sata case and
pata case is seperated for easy reading.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-25 15:33:33 -05:00
Aaron Lu
213342053d libata: handle power transition of ODD
When ata port is runtime suspended, it will check if the ODD attched to
it is a zero power(ZP) capable ODD and if the ZP capable ODD is in zero
power ready state. And if this is not the case, the highest acpi state
will be limited to ACPI_STATE_D3_HOT to avoid powering off the ODD. And
if the ODD can be powered off, runtime wake capability needs to be
enabled and powered_off flag will be set to let resume code knows that
the ODD was in powered off state.

And on resume, before it is powered on, if it was powered off during
suspend, runtime wake capability needs to be disabled. After it is
recovered, the ODD is considered functional, post power on processing
like eject tray if the ODD is drawer type is done, and several ZPODD
related fields will also be reset.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-21 15:41:56 -05:00
Aaron Lu
3dc67440d9 libata: check zero power ready status for ZPODD
Per the Mount Fuji spec, the ODD is considered zero power ready when:
  - For slot type ODD, no media inside;
  - For tray type ODD, no media inside and tray closed.

The information can be retrieved by either the returned information of
command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for
media event) or sense code.

The information provided by the media status byte is not accurate, it
is possible that after a new disc is just inserted, the status byte
still returns media not present. So this information can not be used as
the deciding factor, we use sense code to decide if zpready status is
true.

When we first sensed the ODD in the zero power ready state, the
zp_sampled will be set and timestamp will be recoreded. And after ODD
stayed in this state for some pre-defined period, the ODD is considered
as power off ready and the zp_ready flag will be set. The zp_ready flag
serves as the deciding factor other code will use to see if power off is
OK for the ODD.

The Mount Fuji spec suggests a delay should be used here, to avoid the
case user ejects the ODD and then instantly inserts a new one again, so
that we can avoid a power transition. And some ODDs may be slow to place
its head to the home position after disc is ejected, so a delay here is
generally a good idea. And the delay time can be changed via the module
param zpodd_poweroff_delay.

The zero power ready status check is performed in the ata port's runtime
suspend code path, when port is not frozen yet, as we need to issue some
IOs to the ODD.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-21 15:41:34 -05:00
Bian Yu
1eaca39a84 [libata] ahci: Fix lack of command retry after a success error handler.
It should be a mistake introduced by commit 8d899e70c1.

qc->flags can't be set AC_ERR_*

Signed-off-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-14 13:05:17 -05:00
Aaron Lu
5416912af7 libata: set dma_mode to 0xff in reset
ata_device->dma_mode's initial value is zero, which is not a valid dma
mode, but ata_dma_enabled will return true for this value. This patch
sets dma_mode to 0xff in reset function, so that ata_dma_enabled will
not return true for this case, or it will cause problem for pata_acpi.

The corrsponding bugzilla page is at:
https://bugzilla.kernel.org/show_bug.cgi?id=49151

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Szymon Janc <szymon@janc.net.pl>
Tested-by: Dutra Julio <dutra.julio@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-12-03 05:07:13 -05:00
Linus Torvalds
3151367f87 SCSI for-linus on 20121002
This is a large set of updates, mostly for drivers (qla2xxx [including support
 for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa, be2iscsi, isci,
 lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas).  There's also a rework for tape
 adding virtually unlimited numbers of tape drives plus a set of dif fixes for
 sd and a fix for a live lock on hot remove of SCSI devices.
 
 This round includes a signed tag pull of isci-for-3.6
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJQaqCFAAoJEDeqqVYsXL0MKJ4IALg/Obnk0/fNvBUNIrh5zRmj
 r9UlXFJnlEDT03qRGdn8okgWMChbgaD1ZrwDTQnjNsabVQoTXI6oO6/uL2c8crpY
 BFBwJvkNJS99nbcZv10CpJ3K7ykmRnKlkYon12iknhGwdtU+XJ14Z4PUcZkI9jmg
 sBQQ6uNVWyosaONNE+k6o+dw6OTttJkzRX8e9in3thstxNTcG+h9iB1zZ/ETkSEj
 tD4MyOgDiPf8kPV2awQThQGpni9Tu3SQr5dEn/iUUktUjiYsDNQuyaAk+QzyhUU7
 D35iIJnIHlXTSTMQkrG4qpJHBvqPkWlYJzaOmheQryQ3vzp2C5Ly/hS9il45uIQ=
 =49u9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "This is a large set of updates, mostly for drivers (qla2xxx [including
  support for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa,
  be2iscsi, isci, lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas).

  There's also a rework for tape adding virtually unlimited numbers of
  tape drives plus a set of dif fixes for sd and a fix for a live lock
  on hot remove of SCSI devices.

  This round includes a signed tag pull of isci-for-3.6

  Signed-off-by: James Bottomley <JBottomley@Parallels.com>"

Fix up trivial conflict in drivers/scsi/qla2xxx/qla_nx.c due to new PCI
helper function use in a function that was removed by this pull.

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (198 commits)
  [SCSI] st: remove st_mutex
  [SCSI] sd: Ensure we correctly disable devices with unknown protection type
  [SCSI] hpsa: gen8plus Smart Array IDs
  [SCSI] qla4xxx: Update driver version to 5.03.00-k1
  [SCSI] qla4xxx: Disable generating pause frames for ISP83XX
  [SCSI] qla4xxx: Fix double clearing of risc_intr for ISP83XX
  [SCSI] qla4xxx: IDC implementation for Loopback
  [SCSI] qla4xxx: update copyrights in LICENSE.qla4xxx
  [SCSI] qla4xxx: Fix panic while rmmod
  [SCSI] qla4xxx: Fail probe_adapter if IRQ allocation fails
  [SCSI] qla4xxx: Prevent MSI/MSI-X falling back to INTx for ISP82XX
  [SCSI] qla4xxx: Update idc reg in case of PCI AER
  [SCSI] qla4xxx: Fix double IDC locking in qla4_8xxx_error_recovery
  [SCSI] qla4xxx: Clear interrupt while unloading driver for ISP83XX
  [SCSI] qla4xxx: Print correct IDC version
  [SCSI] qla4xxx: Added new mbox cmd to pass driver version to FW
  [SCSI] scsi_dh_alua: Enable STPG for unavailable ports
  [SCSI] scsi_remove_target: fix softlockup regression on hot remove
  [SCSI] ibmvscsi: Fix host config length field overflow
  [SCSI] ibmvscsi: Remove backend abstraction
  ...
2012-10-02 19:01:32 -07:00
Shane Huang
65fe1f0f66 ahci: implement aggressive SATA device sleep support
Device Sleep is a feature as described in AHCI 1.3.1 Technical Proposal.
This feature enables an HBA and SATA storage device to enter the DevSleep
interface state, enabling lower power SATA-based systems.

Aggressive Device Sleep enables the HBA to assert the DEVSLP signal as
soon as there are no commands outstanding to the device and the port
specific Device Sleep idle timer has expired. This enables autonomous
entry into the DevSleep interface state without waiting for software
in power sensitive systems.

This patch enables Aggressive Device Sleep only if both host controller
and device support it.

Tested on AMD reference board together with Device Sleep supported device
sample.

Signed-off-by: Shane Huang <shane.huang@amd.com>
Reviewed-by: Aaron Lu <aaron.lwe@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-09-13 01:08:53 -04:00
Dan Williams
ca6d43b051 [SCSI] libata: reset once
Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up.  In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-08-24 13:04:08 +04:00
Jeff Garzik
8407884dd9 Merge branch 'master' [vanilla Linus master] into libata-dev.git/upstream
Two bits were appended to the end of the bitfield
list in struct scsi_device.  Resolve that conflict
by including both bits.

Conflicts:
	include/scsi/scsi_device.h
2012-07-25 15:58:48 -04:00
H Hartley Sweeten
604284071a libata-eh.c: local functions should not be exposed globally
The function ata_ering_clear_cb is only referenced in this file and
should be marked static to prevent it from being exposed globally.

This quiets the sparse warning:

warning: symbol 'ata_ering_clear_cb' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-07-25 15:17:32 -04:00
Dan Williams
e4a9c3732c [SCSI] libata, libsas: introduce sched_eh and end_eh port ops
When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship.  libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time.  Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:45 +01:00
Mark Lord
8d899e70c1 libata-eh don't waste time retrying media errors (v3)
ATA and SATA drives have had built-in retries for media errors
for as long as they've been commonplace in computers (early 1990s).

When libata stumbles across a bad sector, it can waste minutes
sitting there doing retry after retry before finally giving up
and letting the higher layers deal with it.

This patch removes retries for media errors only.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-05-07 15:40:34 -04:00
Lin Ming
6868225e3e libata: skip old error history when counting probe trials
Commit d902747("[libata] Add ATA transport class") introduced
ATA_EFLAG_OLD_ER to mark entries in the error ring as cleared.

But ata_count_probe_trials_cb() didn't check this flag and it still
counts the old error history. So wrong probe trials count is returned
and it causes problem, for example, SATA link speed is slowed down from
3.0Gbps to 1.5Gbps.

Fix it by checking ATA_EFLAG_OLD_ER in ata_count_probe_trials_cb().

Cc: stable <stable@vger.kernel.org> # 2.6.37+
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-05-03 14:05:58 -04:00
Dan Williams
81c757bc69 [SCSI] libsas: execute transport link resets with libata-eh via host workqueue
Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata.  Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds.  They are not
prepared for it to loop back into eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:13:51 -06:00
Gwendal Grignou
7a46c0780b [libata] Issue SRST to Sil3726 PMP
Reenable sending SRST to devices connected behind a Sil3726 PMP.
This allow staggered spinups and handles drives that spins up slowly.

While the drives spin up, the PMP will not accept SRST.
Most controller reissues the reset until the drive is ready, while
some [Sil3124] returns an error.
In ata_eh_error, wait 10s before reset the ATA port and try again.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-11-09 01:38:00 -05:00
Paul Gortmaker
38789fda29 ide/ata: Add export.h for EXPORT_SYMBOL/THIS_MODULE where needed
They were getting this implicitly by an include of module.h
from device.h -- but we are going to clean that up and break
that include chain, so include export.h explicitly now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:37 -04:00
Sergei Shtylyov
e8411fbad6 libata-eh: ata_eh_followup_srst_needed() does not need 'classes' parameter
... since it does not use it.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-10-08 00:07:15 -04:00
Tejun Heo
8ea7645c5a libata: leave port thawed after reset failure
libata EH intentionally left a port frozen if it failed
ata_eh_reset().  The intention was avoiding continuous loop of resets
when the controller or attached device is flaky and reporting spurious
hotplug events.  Once port enters this state, it can be recovered with
manual rescan, which seemed reasonable.

However, outside of my convoluted test setup, there have been very few
reports justifying this choice while there have been more cases where
the automatic freezing of the port after hotplug attempt of a faulty
device caused confusion and led to unnecessary resets.

This patch changes the behavior so that the port is thawed after reset
failure.  This change doesn't necessarily solve but makes it easier
and more intuitive to work around hotplug related problems
(ie. re-pluggin or power cycling the device) as reported in the
followings.

  https://bugzilla.kernel.org/show_bug.cgi?id=34712
  http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Reartes Guillermo <rtguille@gmail.com>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23 17:57:36 -04:00
Joe Perches
a9a79dfec2 ata: Convert ata_<foo>_printk(KERN_<LEVEL> to ata_<foo>_<level>
Saves text by removing nearly duplicated text format strings by
creating ata_<foo>_printk functions and printf extension %pV.

ata defconfig size shrinks ~5% (~8KB), allyesconfig ~2.5% (~13KB)

Format string duplication comes from:

 #define ata_link_printk(link, lv, fmt, args...) do { \
       if (sata_pmp_attached((link)->ap) || (link)->ap->slave_link)    \
               printk("%sata%u.%02u: "fmt, lv, (link)->ap->print_id,   \
                      (link)->pmp , ##args); \
       else \
               printk("%sata%u: "fmt, lv, (link)->ap->print_id , ##args); \
       } while(0)

Coalesce long formats.

$ size drivers/ata/built-in.*
   text	   data	    bss	    dec	    hex	filename
 544969	  73893	 116584	 735446	  b38d6	drivers/ata/built-in.allyesconfig.ata.o
 558429	  73893	 117864	 750186	  b726a	drivers/ata/built-in.allyesconfig.dev_level.o
 141328	  14689	   4220	 160237	  271ed	drivers/ata/built-in.defconfig.ata.o
 149567	  14689	   4220	 168476	  2921c	drivers/ata/built-in.defconfig.dev_level.o

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23 17:57:36 -04:00
Tejun Heo
8c56cacc72 libata: fix unexpectedly frozen port after ata_eh_reset()
To work around controllers which can't properly plug events while
reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING
after reset but before RESET is marked done.  As reset is the final
recovery action and full verification of devices including onlineness
and classfication match is done afterwards, this shouldn't lead to
lost devices or missed hotplug events.

Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so
if the condition happens after resetting an empty port, the port could
be left frozen and EH will end without thawing it, making the port
unresponsive to further hotplug events.

Thaw if the port is frozen after clearing EH_PENDING.  This problem is
reported by Bruce Stenning in the following thread.

 http://thread.gmane.org/gmane.linux.kernel/1123265

stable: I think we should weather this patch a bit longer in -rcX
	before sending it to -stable.  Please wait at least a month
	after this patch makes upstream.  Thanks.

-v2: Fixed spelling in the comment per Dave Howorth.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Cc: stable@kernel.org
Cc: Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-06-07 15:55:55 -04:00
Kristen Carlson Accardi
8a745f1f39 libata: Power off empty ports
Give users the option of completely powering off unoccupied
SATA ports using the existing min_power link_power_management_policy
option.  When the use selects this option on an empty port, we
will power the port off by setting DET to off.  For occupied ports,
behavior is unchanged.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-05-19 20:50:53 -04:00
Tejun Heo
5f6f12ccf3 libata: fix oops when LPM is used with PMP
ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65)
added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag.
However, @ap is NULL if @link points to a PMP link and thus the
unconditional @ap->flags dereference leads to the following oops.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
  IP: [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  ...
  Pid: 295, comm: scsi_eh_4 Tainted: P            2.6.38.5-core2 #1 System76, Inc. Serval Professional/Serval Professional
  RIP: 0010:[<ffffffff813f98e1>]  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  RSP: 0018:ffff880132defbf0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000
  RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000
  RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98
  R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
  R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0)
  Stack:
   0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738
   ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190
   ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000
  Call Trace:
   [<ffffffff81400867>] sata_pmp_error_handler+0x607/0xc30
   [<ffffffffa00b273f>] ahci_error_handler+0x1f/0x70 [libahci]
   [<ffffffff813faade>] ata_scsi_error+0x5be/0x900
   [<ffffffff813cf724>] scsi_error_handler+0x124/0x650
   [<ffffffff810834b6>] kthread+0x96/0xa0
   [<ffffffff8100cd64>] kernel_thread_helper+0x4/0x10
  Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 <48> 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83
  RIP  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
   RSP <ffff880132defbf0>
  CR2: 0000000000000018

Fix it by testing @link->ap->flags instead.

stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was
        backported to 2.6.37 and 38.  This is a fix for that and thus
        also applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Nathan A. Mourey II" <nmoureyii@ne.rr.com>
LKML-Reference: <1304555277.2059.2.camel@localhost.localdomain>
Cc: Connor H <cmdkhh@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-05-14 14:51:40 -04:00
Tejun Heo
ae01b2493c libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used.  Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

 http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:32:16 -04:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
James Bottomley
0e0b494ca8 libata: separate error handler into usable components
Right at the moment, the libata error handler is incredibly
monolithic.  This makes it impossible to use from composite drivers
like libsas and ipr which have to handle error themselves in the first
instance.

The essence of the change is to split the monolithic error handler
into two components: one which handles a queue of ata commands for
processing and the other which handles the back end of readying a
port.  This allows the upper error handler fine grained control in
calling libsas functions (and making sure they only get called for ATA
commands whose lower errors have been fixed up).

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:45 -05:00
James Bottomley
c34aeebc06 libata: fix eh locking
The SCSI host eh_cmd_q should be protected by the host lock (not the
port lock).  This probably doesn't matter that much at the moment,
since we try to serialise the add and eh pieces, but it might matter
in future for more convenient error handling.  Plus this switches
libata to the standard eh pattern where you lock, remove from the cmd
queue to a local list and unlock and then operate on the local list.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:45 -05:00
Tejun Heo
eb0e85e36b libata: fix hotplug for drivers which don't implement LPM
ata_eh_analyze_serror() suppresses hotplug notifications if LPM is
being used because LPM generates spurious hotplug events.  It compared
whether link->lpm_policy was different from ATA_LPM_MAX_POWER to
determine whether LPM is enabled; however, this is incorrect as for
drivers which don't implement LPM, lpm_policy is always
ATA_LPM_UNKNOWN.  This disabled hotplug detection for all drivers
which don't implement LPM.

Fix it by comparing whether lpm_policy is greater than
ATA_LPM_MAX_POWER.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:34:20 -05:00
Tejun Heo
e5005b15c9 libata: issue DIPM enable commands with LPM state updated
Low level drivers may behave differently depending on the current
link->lpm_policy.  During ata_eh_set_lpm(), DIPM enable commands are
issued after the successful completion of ap->ops->set_lpm(), which
means that the controller is already in the target state.  This causes
DIPM enable commands to be processed with mismatching controller power
state and link->lpm_policy value.

In ahci, link->lpm_policy is used to ignore certain PHY events if LPM
is enabled; however, as DIPM commands are issued with stale
link->lpm_policy, they sometimes end up triggering these conditions
and get aborted leading to LPM configuration failure.

Fix it by updating link->lpm_policy before issuing DIPM enable
commands.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-12-24 13:34:34 -05:00
Tejun Heo
c0c362b60e libata: implement cross-port EH exclusion
In libata, the non-EH code paths should always take and release
ap->lock explicitly when accessing hardware or shared data structures.
However, once EH is active, it's assumed that the port is owned by EH
and EH methods don't explicitly take ap->lock unless race from irq
handler or other code paths are expected.  However, libata EH didn't
guarantee exclusion among EHs for ports of the same host.  IOW,
multiple EHs may execute in parallel on multiple ports of the same
controller.

In many cases, especially in SATA, the ports are completely
independent of each other and this doesn't cause problems; however,
there are cases where different ports share the same resource, which
lead to obscure timing related bugs such as the one fixed by commit
213373cf (ata_piix: fix locking around SIDPR access).

This patch implements exclusion among EHs of the same host.  When EH
begins, it acquires per-host EH ownership by calling ata_eh_acquire().
When EH finishes, the ownership is released by calling
ata_eh_release().  EH ownership is also released whenever the EH
thread goes to sleep from ata_msleep() or explicitly and reacquired
after waking up.

This ensures that while EH is actively accessing the hardware, it has
exclusive access to it while allowing EHs to interleave and progress
in parallel as they hit waiting stages, which dominate the time spent
in EH.  This achieves cross-port EH exclusion without pervasive and
fragile changes while still allowing parallel EH for the most part.

This was first reported by yuanding02@gmail.com more than three years
ago in the following bugzilla.  :-)

  https://bugzilla.kernel.org/show_bug.cgi?id=8223

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: yuanding02@gmail.com
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:05 -04:00
Tejun Heo
97750cebb3 libata: add @ap to ata_wait_register() and introduce ata_msleep()
Add optional @ap argument to ata_wait_register() and replace msleep()
calls with ata_msleep() which take optional @ap in addition to the
duration.  These will be used to implement EH exclusion.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:05 -04:00
Tejun Heo
6c8ea89cec libata: implement LPM support for port multipliers
Port multipliers can do DIPM on fan-out links fine.  Implement support
for it.  Tested w/ SIMG 57xx and marvell PMPs.  Both the host and
fan-out links enter power save modes nicely.

SIMG 37xx and 47xx report link offline on SStatus causing EH to detach
the devices.  Blacklisted.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:04 -04:00
Tejun Heo
6b7ae9545a libata: reimplement link power management
The current LPM implementation has the following issues.

* Operation order isn't well thought-out.  e.g. HIPM should be
  configured after IPM in SControl is properly configured.  Not the
  other way around.

* Suspend/resume paths call ata_lpm_enable/disable() which must only
  be called from EH context directly.  Also, ata_lpm_enable/disable()
  were called whether LPM was in use or not.

* Implementation is per-port when it should be per-link.  As a result,
  it can't be used for controllers with slave links or PMP.

* LPM state isn't managed consistently.  After a link reset for
  whatever reason including suspend/resume the actual LPM state would
  be reset leaving ap->lpm_policy inconsistent.

* Generic/driver-specific logic boundary isn't clear.  Currently,
  libahci has to mangle stuff which libata EH proper should be
  handling.  This makes the implementation unnecessarily complex and
  fragile.

* Tied to ALPM.  Doesn't consider DIPM only cases and doesn't check
  whether the device allows HIPM.

* Error handling isn't implemented.

Given the extent of mismatch with the rest of libata, I don't think
trying to fix it piecewise makes much sense.  This patch reimplements
LPM support.

* The new implementation is per-link.  The target policy is still
  port-wide (ap->target_lpm_policy) but all the mechanisms and states
  are per-link and integrate well with the rest of link abstraction
  and can work with slave and PMP links.

* Core EH has proper control of LPM state.  LPM state is reconfigured
  when and only when reconfiguration is necessary.  It makes sure that
  LPM state is reset when probing for new device on the link.
  Controller agnostic logic is now implemented in libata EH proper and
  driver implementation only has to deal with controller specifics.

* Proper error handling.  LPM config failure is attributed to the
  device on the link and LPM is disabled for the link if it fails
  repeatedly.

* ops->enable/disable_pm() are replaced with single ops->set_lpm()
  which takes @policy and @hints.  This simplifies driver specific
  implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:04 -04:00
Tejun Heo
c93b263e0d libata: clean up lpm related symbols and sysfs show/store functions
Link power management related symbols are in confusing state w/ mixed
usages of lpm, ipm and pm.  This patch cleans up lpm related symbols
and sysfs show/store functions as follows.

* lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and
  MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and
  ATA_LPM_{MIN|MAX|MED}_POWER.

* Pre/postfixes are unified to lpm.

* sysfs show/store functions for link_power_management_policy were
  curiously named get/put and unnecessarily complex.  Renamed to
  show/store and simplified.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:04 -04:00
Gwendal Grignou
d9027470b8 [libata] Add ATA transport class
This is a scheleton for libata transport class.
All information is read only, exporting information from libata:
- ata_port class: one per ATA port
- ata_link class: one per ATA port or 15 for SATA Port Multiplier
- ata_device class: up to 2 for PATA link, usually one for SATA.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:03 -04:00
Tejun Heo
e2f3d75fc0 libata: skip EH autopsy and recovery during suspend
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend.  As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09 22:27:59 -04:00
Linus Torvalds
3b7433b8a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
  workqueue: mark init_workqueues() as early_initcall()
  workqueue: explain for_each_*cwq_cpu() iterators
  fscache: fix build on !CONFIG_SYSCTL
  slow-work: kill it
  gfs2: use workqueue instead of slow-work
  drm: use workqueue instead of slow-work
  cifs: use workqueue instead of slow-work
  fscache: drop references to slow-work
  fscache: convert operation to use workqueue instead of slow-work
  fscache: convert object to use workqueue instead of slow-work
  workqueue: fix how cpu number is stored in work->data
  workqueue: fix mayday_mask handling on UP
  workqueue: fix build problem on !CONFIG_SMP
  workqueue: fix locking in retry path of maybe_create_worker()
  async: use workqueue for worker pool
  workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
  workqueue: implement unbound workqueue
  workqueue: prepare for WQ_UNBOUND implementation
  libata: take advantage of cmwq and remove concurrency limitations
  workqueue: fix worker management invocation without pending works
  ...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07 12:42:58 -07:00
FUJITA Tomonori
acad76272c [libata] add ATA_CMD_DSM to ata_get_cmd_descript
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-08-01 19:36:03 -04:00
Tejun Heo
ad72cf9885 libata: take advantage of cmwq and remove concurrency limitations
libata has two concurrency related limitations.

a. ata_wq which is used for polling PIO has single thread per CPU.  If
   there are multiple devices doing polling PIO on the same CPU, they
   can't be executed simultaneously.

b. ata_aux_wq which is used for SCSI probing has single thread.  In
   cases where SCSI probing is stalled for extended period of time
   which is possible for ATAPI devices, this will stall all probing.

#a is solved by increasing maximum concurrency of ata_wq.  Please note
that polling PIO might be used under allocation path and thus needs to
be served by a separate wq with a rescuer.

#b is solved by using the default wq instead and achieving exclusion
via per-port mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
2010-07-02 10:59:24 +02:00
Tejun Heo
fe06e5f9b7 libata-sff: separate out BMDMA EH
Some of error handling logic in ata_sff_error_handler() and all of
ata_sff_post_internal_cmd() are for BMDMA.  Create
ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move
BMDMA part into those.

While at it, change DMA protocol check to ata_is_dma(), fix
post_internal_cmd to call ap->ops->bmdma_stop instead of directly
calling ata_bmdma_stop() and open code hardreset selection so that
ata_std_error_handler() doesn't have to know about sff hardreset.

As these two functions are BMDMA specific, there's no reason to check
for bmdma_addr before calling bmdma methods if the protocol of the
failed command is DMA.  sata_mv and pata_mpc52xx now don't need to set
.post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't
need to set it to their bmdma_stop routines.

ata_sff_post_internal_cmd() becomes noop and is removed.

This fixes p3 described in clean-up-BMDMA-initialization patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-19 13:36:46 -04:00
Tejun Heo
c429137a67 libata-sff: port_task is SFF specific
port_task is tightly bound to the standard SFF PIO HSM implementation.
Using it for any other purpose would be error-prone and there's no
such user and if some drivers need such feature, it would be much
better off using its own.  Move it inside CONFIG_ATA_SFF and rename it
to sff_pio_task.

The only function which is exposed to the core layer is
ata_sff_flush_pio_task() which is renamed from ata_port_flush_task()
and now also takes care of resetting hsm_task_state to HSM_ST_IDLE,
which is possible as it's now specific to PIO HSM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-19 13:35:49 -04:00
Jeff Garzik
a09bf4cd53 libata: ensure NCQ error result taskfile is fully initialized
before returning it via qc->result_tf.

Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-22 21:59:13 -04:00
Tejun Heo
fa41efdae7 libata: fix locking around blk_abort_request()
blk_abort_request() expectes queue lock to be held by the caller.
Grab it before calling the function.

Lack of this synchronization led to infinite loop on corrupt
q->timeout_list.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-22 21:47:52 -04:00
Tejun Heo
534ead7092 libata: retry FS IOs even if it has failed with AC_ERR_INVALID
libata currently doesn't retry if a command fails with AC_ERR_INVALID
assuming that retrying won't get it any further even if retried.
However, a failure may be classified as invalid through hardware
glitch (incorrect reading of the error register or firmware bug) and
there isn't whole lot to gain by not retrying as actually invalid
commands will be failed immediately.  Also, commands serving FS IOs
are extremely unlikely to be invalid.  Retry FS IOs even if it's
marked invalid.

Transient and incorrect invalid failure was seen while debugging
firmware related issue on Samsung n130 on bko#14314.

  http://bugzilla.kernel.org/show_bug.cgi?id=14314

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-01-20 14:25:11 -05:00
Tejun Heo
6013efd886 libata: retry failed FLUSH if device didn't fail it
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.

However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.

This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination.  Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode.  This was reported in the following bugzilla bug.

  http://bugzilla.kernel.org/show_bug.cgi?id=14543

This patch makes libata EH retry FLUSH if it wasn't failed by the
device.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Tejun Heo
4f7c287499 libata: fix PMP initialization
Commit 842faa6c1a fixed error handling
during attach by not committing detected device class to dev->class
while attaching a new device.  However, this change missed the PMP
class check in the configuration loop causing a new PMP device to go
through ata_dev_configure() as if it were an ATA or ATAPI device.

As PMP device doesn't have a regular IDENTIFY data, this makes
ata_dev_configure() tries to configure a PMP device using an invalid
data.  For the most part, it wasn't too harmful and went unnoticed but
this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
sata_pmp_attach().  This means that SATA_PMP_FEAT_NOTIFY ends up being
disabled on PMPs and on PMPs which honor the flag breaks hotplug
support.

This problem was discovered and reported by Ethan Hsiao.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ethan Hsiao <ethanhsiao@jmicron.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-10-16 06:21:54 -04:00
Tejun Heo
3b761d3d43 libata: fix incorrect link online check during probe
While trying to work around spurious detection retries for
non-existent devices on slave links, commit
816ab89782 incorrectly added link
offline check logic before ata_eh_thaw() was called.  This means that
if an occupied link goes down briefly at the time that offline check
was performed, device class will be cleared to ATA_DEV_NONE and libata
wouldn't retry thus failing detection of the device.

The offline check should be done after the port is thawed together
with online check so that such link glitches can be detected by the
interrupt handler and handled properly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Blechmann <tim@klingt.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-10-06 20:58:18 -04:00
Robert Hancock
6521148c64 libata: add command name parsing for error output
This patch improve libata's output for error/notification messages
to allow easier comprehension and debugging:

When ATAPI commands issued through the SCSI layer fail, use SCSI
functions to print the CDB in human-readable form instead of just
dumping out the CDB in hex.

Print out the name of the failed command (as defined by the ATA
specification) in error handling output along with the raw register
contents.

When reporting status of ACPI taskfile commands executed on resume,
also output the names of the commands being executed (or not) in
readable form.

Since the extra data for printing command names increases kernel
size slightly, a config option has been added to allow disabling
command name output (as well as some of the error register parsing)
for those highly sensitive to kernel text size.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01 19:47:20 -04:00
Tejun Heo
1e641060c4 libata: clear eh_info on reset completion
Resets are done with port frozen but some controllers still issue
interrupts during reset and they may end up recording error conditions
in ehi leading to unnecessary EH retrials.

This patch makes ata_eh_reset() clear ehi on reset completion.  As
reset is the most severe recovery action, there's nothing to lose by
clearing ehi on its completion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01 19:47:19 -04:00
Jeff Garzik
54c38444fa [libata] EH: freeze port before aborting commands
Call the ->freeze() hook before aborting qc's, because some hardware
requires special handling prior to accessing the taskfile registers
(for diagnosis/analysis/reset).  Most notably, hardware may wish to
disable the DMA engine or interrupts in the ->freeze() hook.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01 19:47:19 -04:00
Bartlomiej Zolnierkiewicz
705d201414 libata: add missing NULL pointer check to ata_eh_reset()
drivers/ata/libata-eh.c +2403 ata_eh_reset(80) warning: variable derefenced before check 'slave'

Please note that this is _not_ a real bug at the moment since ata_eh_context
structure is embedded into ata_list structure and the code alwas checks for
'slave' before accessing 'sehc'.

Anyway lets add missing check and always have a valid 'sehc' pointer (which
makes code easier to understand and prevents introducing some possible bugs
in the future).

Reported-by: Dan Carpenter <error27@gmail.com>
Cc: corbet@lwn.net
Cc: eteo@redhat.com
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-07-28 21:05:41 -04:00
Tejun Heo
fe2c4d018f libata: fix follow-up SRST failure path
ata_eh_reset() was missing error return handling after follow-up SRST
allowing EH to continue the normal probing path after reset failure.
This was discovered while testing new WD 2TB drives which take longer
than 10 secs to spin up and cause the first follow-up SRST to time
out.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-07-14 22:41:28 -04:00
Martin Olsson
98a1708de1 trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in documentation and source comments.
Signed-off-by: Martin Olsson <martin@minimum.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:46 +02:00
Tejun Heo
6f9c1ea2c1 libata: clear ering on resume
Error timestamps are in jiffies which doesn't run while suspended and
PHY events during resume isn't too uncommon.  When the two are
combined, it can lead to unnecessary speed downs if the machine is
suspended and resumed repeatedly.  Clear error history on resume.

This was reported and verified in bnc#486803 by Vladimir Botka.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vladimir Botka <vbotka@novell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-11 14:30:59 -04:00
Tejun Heo
842faa6c1a libata: fix attach error handling
New device attach path in ata_eh_revalidate_and_attach() is divided
into two separate loops because ATA requires IDENTIFY to be issued to
slave first while the user expects to see device probe messages from
the master device.  new_mask is used to track which devices are the
new ones between the first loop and the second.

This usually works well but if an error occurs during configuration
stage, ata_dev_revalidate_and_attach() returns with error code and
forgets new_mask.  On the retry run, dev->class is set and new_mask
for the device is clear, so the device just gets revalidated and thus
ends up skipping post-configuration procedure including scheduling of
SCSI_HOTPLUG for the device.  When this occurs, ATA part of probing
works fine but SCSI probing usually doesn't happen and makes the
device unreachable.

The behavior has been around for a very long time but it has been
uncovered with the recent addition of 1_5_GBPS horkage which uses
-EAGAIN return value from ata_dev_configure() to restart the probing
sequence after forcing cable speed.

This can be fixed by making sure dev->class is permanently set only
after all configurations are successfully complete.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-11 14:26:01 -04:00
Alan Cox
c96f1732e2 [libata] Improve timeout handling
On a timeout call a device specific handler early in the recovery so that
we can complete and process successful commands which timed out due to IRQ
loss or the like rather more elegantly.

[Revised to exclude the timeout handling on a few devices that inherit from
 SFF but are not SFF enough to use the default timeout handler]

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-03-24 22:52:39 -04:00
Tejun Heo
d6515e6ff4 libata: make sure port is thawed when skipping resets
When SCR access is available and the link is offline, softreset is
skipped as it only wastes time and some controllers don't respond very
well.  However, the skip path forgot to thaw the port, which not only
blocks further event notification from the port but also causes
repeated EH invocations on the same event on drivers which rely on
->thaw() to clear events if the IRQ is shared with another device or
port.

This problem has always been there but is uncovered by recent sata_nv
nf2/3 change which dropped hardreset support while maintaining SCR
access.  nf2/3 doesn't clear hotplug event mask from the interrupt
handler but relies on ->thaw() to clear them.  When the hardreset was
there, the reset action was never skipped and the port was always
thawed but, with the hardreset gone, ->prereset() determines that
there's no need for softreset and both ->softreset() and ->thaw() are
skipped.  This leads to stuck hotplug event in the IRQ status register
triggering hotplug event whenever IRQ is delieverd on the same IRQ.
As the controller shares the same IRQ for both ports, this happens on
every IO if one port is occpupied and the other isn't.

This patch fixes the problem by making sure that the port is thawed on
reset-skip path.

bko#11615 reports this problem.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockrwd@gmail.com>
Reported-by: Dan Andresan <danyer@gmail.com>
Reported-by: Arne Woerner <arne_woerner@yahoo.com>
Reported-by: Stefan Lippers-Hollmann <s.L-H@gmx.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-03-05 07:25:43 -05:00
Tejun Heo
b535708146 libata: don't use on-stack sense buffer
sense_buffer is used as DMA target and shouldn't be allocated on
stack.  Use ap->sector_buf instead.  This problem is spotted by Chuck
Ebbert.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-03-05 07:25:10 -05:00
Tejun Heo
cf9a590a9e libata: add no penalty retry request for EH device handling routines
Let -EAGAIN from EH device handling routines trigger EH retry without
consuming its tries count.  This will be used to implement link SPD
horkage which requires hardreset to adjust SPD without affecting other
EH decisions.  As it bypasses the forward progress guarantee provided
by the tries count, the requester is responsible for ensuring forward
progress.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:04:19 -05:00
Tejun Heo
c2c7a89c5e libata: improve probe failure handling
When link is flaky at high speed, it isn't uncommon for a device to
repeatedly fail probing sequence early after successfully negotiating
high link speed.  This often leads to consecutive hotplug events
without successful probing.

This patch improves libata EH such that it remembers probing trials
and if there have been more than two unsuccessful trials in the past
60 seconds, slows down link speed to 1.5Gbps.

As link speed negotiation is the duty of the PHY layer proper, the
goal of this fallback mechanism is to provide the last resort when
everything else fails, which unfortunately happens not too
infrequently, so no fancy 6->3->1.5 speeding down or highest
successful transmission speed seen kind of logics (yet).

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:34 -05:00
Tejun Heo
a07d499b47 libata: add @spd_limit to sata_down_spd_limit()
Add @spd_limit to sata_down_spd_limit() so that the caller can specify
the SPD limit it wants.  This parameter doesn't get in the way even
when it's too low.  The closest possible limit is applied.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:22 -05:00
Tejun Heo
99cf610aa4 libata: clear dev->ering in smarter way
dev->ering used to be cleared together with the rest of ata_device in
ata_dev_init() which is called whenever a probing event occurs.
dev->ering is about to be used to track probing failures so it needs
to remain persistent over multiple porbing events.  This patch
achieves this by doing the following.

* Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
  clear between BEGIN and END.  ering is moved after END.  The split
  of persistent area is to allow hotter items remain at the head.

* ering is explicitly cleared on ata_dev_disable() and when device
  attach succeeds.  So, ering is persistent throug a device's life
  time (unless explicitly cleared of course) and also through periods
  inbetween disablement of an attached device and successful detection
  of the next one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:17 -05:00
Tejun Heo
678afac678 libata: move ata_dev_disable() to libata-eh.c
ata_dev_disable() is about to be more tightly integrated into EH
logic.  Move it to libata-eh.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:00 -05:00
Tejun Heo
d89293abd9 libata: fix EH device failure handling
The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.

This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:02:57 -05:00
Tejun Heo
ece180d1cf libata: perform port detach in EH
ata_port_detach() first made sure EH saw ATA_PFLAG_UNLOADING and then
assumed EH context belongs to it and performed detach operation
itself.  However, UNLOADING doesn't disable all of EH and this could
lead to problems including triggering WARN_ON()'s in EH path.

This patch makes port detach behave more like other EH actions such
that ata_port_detach() requests EH to detach and waits for completion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-12-28 22:43:21 -05:00
Tejun Heo
1eca4365be libata: beef up iterators
There currently are the following looping constructs.

* __ata_port_for_each_link() for all available links
* ata_port_for_each_link() for edge links
* ata_link_for_each_dev() for all devices
* ata_link_for_each_dev_reverse() for all devices in reverse order

Now there's a need for looping construct which is similar to
__ata_port_for_each_link() but iterates over PMP links before the host
link.  Instead of adding another one with long name, do the following
cleanup.

* Implement and export ata_link_next() and ata_dev_next() which take
  @mode parameter and can be used to build custom loop.
* Implement ata_for_each_link() and ata_for_each_dev() which take
  looping mode explicitly.

The following iteration modes are implemented.

* ATA_LITER_EDGE		: loop over edge links
* ATA_LITER_HOST_FIRST		: loop over all links, host link first
* ATA_LITER_PMP_FIRST		: loop over all links, PMP links first

* ATA_DITER_ENABLED		: loop over enabled devices
* ATA_DITER_ENABLED_REVERSE	: loop over enabled devices in reverse order
* ATA_DITER_ALL			: loop over all devices
* ATA_DITER_ALL_REVERSE		: loop over all devices in reverse order

This change removes exlicit device enabledness checks from many loops
and makes it clear which ones are iterated over in which direction.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-12-28 22:43:20 -05:00
Tejun Heo
19b723218b libata: fix last_reset timestamp handling
ehc->last_reset is used to ensure that resets are not issued too
close to each other.  It's initialized to jiffies minus one minute
on EH entry.  However, when new links are initialized after PMP is
probed, new links have zero for this timestamp resulting in long wait
depending on the current jiffies.

This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in
which case last_reset is always initialized.  As an added precaution,
WARN_ON() is added so that warning is printed if last_reset is
in future.

This problem is spotted and debugged by Shane Huang.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shane Huang <Shane.Huang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-11-11 03:01:21 -05:00
Tejun Heo
90484ebfc9 libata: clear saved xfer_mode and ncq_enabled on device detach
libata EH saves xfer_mode and ncq_enabled at start to later set
DUBIOUS_XFER flag if it has changed.  These values need to be cleared
on device detach such that hot device swap doesn't accidentally miss
DUBIOUS_XFER.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-27 23:55:40 -04:00
Tejun Heo
4a9c7b3359 libata: fix device iteration bugs
There were several places where only enabled devices should be
iterated over but device enabledness wasn't checked.

* IDENTIFY data 40 wire check in cable_is_40wire()
* xfer_mode/ncq_enabled saving in ata_scsi_error()
* DUBIOUS_XFER handling in ata_set_mode()

While at it, reformat comments in cable_is_40wire().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-27 23:55:12 -04:00
Tejun Heo
816ab89782 libata: set device class to NONE if phys_offline
Reset methods don't have access to phys link status for slave links
and may incorrectly indicate device presence causing unnecessary probe
failures for unoccupied links.  This patch clears device class to NONE
during post-reset processing if phys link is offline.

As on/offlineness semantics is strictly defined and used in multiple
places by the core layer, this won't change behavior for drivers which
don't use slave links.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-22 20:42:43 -04:00
Tejun Heo
a568d1d2e2 libata-eh: fix slave link EH action mask handling
Slave link action mask is transferred to master link and all the EH
actions are taken by the master link.  ata_eh_about_to_do() and
ata_eh_done() are called with ATA_EH_ALL_ACTIONS to clear the slave
link actions during transfer.  This always sets ATA_PFLAG_RECOVERED
flag causing spurious "EH complete" messages.

Don't set ATA_PFLAG_RECOVERED for slave link actions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-22 20:40:21 -04:00
Tejun Heo
848e4c68c4 libata: transfer EHI control flags to slave ehc.i
ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used to control the behavior
of EH.  As only the master link is visible outside EH, these flags are
set only for the master link although they should also apply to the
slave link, which causes spurious EH messages during probe and
suspend/resume.

This patch transfers those two flags to slave ehc.i before performing
slave autopsy and reporting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-22 20:40:19 -04:00
Linus Torvalds
e26feff647 Merge branch 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block: (132 commits)
  doc/cdrom: Trvial documentation error, file not present
  block_dev: fix kernel-doc in new functions
  block: add some comments around the bio read-write flags
  block: mark bio_split_pool static
  block: Find bio sector offset given idx and offset
  block: gendisk integrity wrapper
  block: Switch blk_integrity_compare from bdev to gendisk
  block: Fix double put in blk_integrity_unregister
  block: Introduce integrity data ownership flag
  block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1
  bio.h: Remove unused conditional code
  block: remove end_{queued|dequeued}_request()
  block: change elevator to use __blk_end_request()
  gdrom: change to use __blk_end_request()
  memstick: change to use __blk_end_request()
  virtio_blk: change to use __blk_end_request()
  blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure
  block: add lld busy state exporting interface
  block: Fix blk_start_queueing() to not kick a stopped queue
  include blktrace_api.h in headers_install
  ...
2008-10-10 10:52:45 -07:00
Jens Axboe
242f9dcb8b block: unify request timeout handling
Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-09 08:56:13 +02:00
Tejun Heo
11fc33da8d libata-eh: clear UNIT ATTENTION after reset
Resets make ATAPI devices raise UNIT ATTENTION which fails the next
command.  As resets can happen asynchronously for unrelated reasons,
this sometimes disrupts innocent users.  For example, reading DVD
fails after the system wakes up from suspend or the other device
sharing the channel went through bus error.

Clearing UA has some problems as it might clear UA which the userland
needs to know about.  However, UA after resets can only be about the
reset itself and benefits of clearing it overweights cons.  Missing UA
can only delay failure to one of the following commands anyway.  For
example, timeout while burning is in progress will trigger reset and
reset the device state and probably corrupt the burning run.  Although
the userland application won't get the UA, its pending writes will
fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-09-29 00:29:06 -04:00
Elias Oltmanns
45fabbb77b libata: Implement disk shock protection support
On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-09-29 00:27:54 -04:00
Tejun Heo
b1c72916ab libata: implement slave_link
Explanation taken from the comment of ata_slave_link_init().

 In libata, a port contains links and a link contains devices.  There
 is single host link but if a PMP is attached to it, there can be
 multiple fan-out links.  On SATA, there's usually a single device
 connected to a link but PATA and SATA controllers emulating TF based
 interface can have two - master and slave.

 However, there are a few controllers which don't fit into this
 abstraction too well - SATA controllers which emulate TF interface
 with both master and slave devices but also have separate SCR
 register sets for each device.  These controllers need separate links
 for physical link handling (e.g. onlineness, link speed) but should
 be treated like a traditional M/S controller for everything else
 (e.g. command issue, softreset).

 slave_link is libata's way of handling this class of controllers
 without impacting core layer too much.  For anything other than
 physical link handling, the default host link is used for both master
 and slave.  For physical link handling, separate @ap->slave_link is
 used.  All dirty details are implemented inside libata core layer.
 From LLD's POV, the only difference is that prereset, hardreset and
 postreset are called once more for the slave link, so the reset
 sequence looks like the following.

 prereset(M) -> prereset(S) -> hardreset(M) -> hardreset(S) ->
 softreset(M) -> postreset(M) -> postreset(S)

 Note that softreset is called only for the master.  Softreset resets
 both M/S by definition, so SRST on master should handle both (the
 standard method will work just fine).

As slave_link excludes PMP support and only code paths which deal with
the attributes of physical link are affected, all the changes are
localized to libata.h, libata-core.c and libata-eh.c.

 * ata_is_host_link() updated so that slave_link is considered as host
   link too.

 * iterator extended to iterate over the slave_link when using the
   underbarred version.

 * force param handling updated such that devno 16 is mapped to the
   slave link/device.

 * ata_link_on/offline() updated to return the combined result from
   master and slave link.  ata_phys_link_on/offline() are the direct
   versions.

 * EH autopsy and report are performed separately for master slave
   links.  Reset is udpated to implement the above described reset
   sequence.

Except for reset update, most changes are minor, many of them just
modifying dev->link to ata_dev_phys_link(dev) or using phys online
test instead.

After this update, LLDs can take full advantage of per-dev SCR
registers by simply turning on slave link.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-09-29 00:25:28 -04:00
Tejun Heo
da0e21d3fa libata: use ata_link_printk() when printing SError
SError belongs to link not port.  Use ata_link_printk() to print it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-22 02:19:44 -04:00
Tejun Heo
5dbfc9cb59 libata: always do follow-up SRST if hardreset returned -EAGAIN
As an optimization, follow-up SRST used to be skipped if
classification wasn't requested even when hardreset requested it via
-EAGAIN.  However, some hardresets can't wait for device readiness and
skipping SRST can cause timeout or other failures during revalidation.
Always perform follow-up SRST if hardreset returns -EAGAIN.  This
makes reset paths more predictable and thus less error-prone.

While at it, move hardreset error checking such that it's done right
after hardreset is finished.  This simplifies followup SRST condition
check a bit and makes the reset path easier to modify.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-22 02:19:41 -04:00
Tejun Heo
a674050e06 libata: fix EH action overwriting in ata_eh_reset()
ehc->i.action got accidentally overwritten to ATA_EH_HARD/SOFTRESET in
ata_eh_reset().  The original intention was to clear reset action
which wasn't selected.  This can cause unexpected behavior when other
EH actions are scheduled together with reset.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-22 02:19:39 -04:00
Tejun Heo
05944bdf6f libata: implement no[hs]rst force params
Implement force params nohrst, nosrst and norst.  This is to work
around reset related problems and ease debugging.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-08-22 02:07:43 -04:00
Tejun Heo
3eabddb8ed libata-eh: update atapi_eh_request_sense() to take @dev instead of @qc
Update atapi_eh_request_sense() to take @dev, @sense_buf and
@dfl_sense_key instead of taking @qc and extracting information from
it.  This change is to make the function more generic and allow it to
be called from other places.

While at it, make cdb initialization use initializer.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:33 -04:00
Tejun Heo
87fbc5a060 libata: improve EH internal command timeout handling
ATA_TMOUT_INTERNAL which was 30secs were used for all internal
commands which is way too long when something goes wrong.  This patch
implements command type based stepped timeouts.  Different command
types can use different timeouts and each command type can use
different timeout values after timeouts.

ie. the initial timeout is set to a value which should cover most of
the cases but not too long so that run away cases don't delay things
too much.  After the first try times out, the second try can use
longer timeout and if that one times out too, it can go for full 30sec
timeout.

IDENTIFYs use 5s - 10s - 30s timeout and all other commands use 5s -
10s timeouts.

This patch significantly cuts down the needed time to handle failure
cases while still allowing libata to work with nut job devices through
retries.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Tejun Heo
d8af0eb604 libata: use ULONG_MAX to terminate reset timeout table
This doesn't introduce any functional changes.  This is to make reset
timeout table consistent with to-be-added command timeout tables.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Tejun Heo
0a2c0f5615 libata: improve EH retry delay handling
EH retries were delayed by 5 seconds to ensure that resets don't occur
back-to-back.  However, this 5 second delay is superflous or excessive
in many cases.  For example, after IDENTIFY times out, there's no
reason to wait five more seconds before retrying.

This patch adds ehc->last_reset timestamp and record the timestamp for
the last reset trial or success and uses it to space resets by
ATA_EH_RESET_COOL_DOWN which is 5 secs and removes unconditional 5 sec
sleeps.

As this change makes inter-try waits often shorter and they're
redundant in nature, this patch also removes the "retrying..."
messages.

While at it, convert explicit rounding up division to DIV_ROUND_UP().

This change speeds up EH in many cases w/o sacrificing robustness.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Tejun Heo
341c2c958e libata: consistently use msecs for time durations
libata has been using mix of jiffies and msecs for time druations.
This is getting confusing.  As writing sub HZ values in jiffies is
PITA and msecs_to_jiffies() can't be used as initializer, unify unit
for all time durations to msecs.  So, durations are in msecs and
deadlines are in jiffies.  ata_deadline() is added to compute deadline
from a start time and duration in msecs.

While at it, drop now superflous _msec suffix from arguments and
rename @timeout to @deadline if it represents a fixed point in time
rather than duration.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-07-14 15:59:32 -04:00
Tejun Heo
e0614db2a3 libata: ignore recovered PHY errors
No reason to get overzealous about recovered comm and data errors.
Some PHYs habitually sets them w/o no good reason and being draconian
about these soft error conditions doesn't seem to help anybody.

If need ever rises, we might need to add soft PHY error condition, say
AC_ERR_MAYBE_ATA_BUS and use it only to determine whether speed down
is necessary but I don't think that's very likely to happen.  It's far
more likely we'll get timeouts or fatal transmission errors if
recovered errors are so prominent that they hamper operation.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-19 17:51:47 -04:00
Tejun Heo
f046519fc8 libata: kill hotplug related race condition
Originally, whole reset processing was done while the port is frozen
and SError was cleared during @postreset().  This had two race
conditions.  1: hotplug could occur after reset but before SError is
cleared and libata won't know about it.  2: hotplug could occur after
all the reset is complete but before the port is thawed.  As all
events are cleared on thaw, the hotplug event would be lost.

Commit ac371987a8 kills the first race
by clearing SError during link resume but before link onlineness test.
However, this doesn't fix race #2 and in some cases clearing SError
after SRST is a good idea.

This patch solves this problem by cross checking link onlineness with
classification result after SError is cleared and port is thawed.
Reset is retried if link is online but all devices attached to the
link are unknown.  As all devices will be revalidated, this one-way
check is enough to ensure that all devices are detected and
revalidated reliably.

This, luckily, also fixes the cases where host controller returns
bogus status while harddrive is spinning up after hotplug making
classification run before the device sends the first FIS and thus
causes misdetection.

Low level drivers can bypass the logic by setting class explicitly to
ATA_DEV_NONE if ever necessary (currently none requires this).

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-19 17:51:47 -04:00
Tejun Heo
dc98c32cbe libata: move reset freeze/thaw handling into ata_eh_reset()
Previously reset freeze/thaw handling lived outside of ata_eh_reset()
mainly because the original PMP reset code needed the port frozen
while resetting all the fan-out ports, which is no longer the case.

This patch moves freeze/thaw handling into ata_eh_reset().
@prereset() and @postreset() are now called w/o freezing the port
although @prereset() an be called frozen if the port is frozen prior
to entering ata_eh_reset().

This makes code simpler and will help removing hotplug event related
races.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-19 17:51:47 -04:00
Tejun Heo
932648b007 libata: reorganize ata_eh_reset() no reset method path
Reorganize ata_eh_reset() such that @prereset() is called even when no
reset method is available and if block is used instead of goto to skip
actual reset.  This makes no reset case behave better (readiness wait)
and future changes easier.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-19 17:51:47 -04:00
Mark Lord
10acf3b0d3 libata: export ata_eh_analyze_ncq_error
Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv,
as suggested by Tejun.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 11:37:58 -04:00
Mark Lord
a6116c9e60 libata-eh set tf flags in NCQ EH result_tf
Fix mis-reporting of NCQ errors by ensuring that result_tf->flags
is properly initialized in libata-eh.  This allows ata_gen_ata_sense()
to report the failed block number correctly to SCSI after a media error
during NCQ.

This patch may also be a candidate for backporting to earlier kernels.
Without this fix, SCSI will fail I/O on the entire request rather
than just the bad sector.  That can be bad for a request that was
merged from many independent read reads from different tasks.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-25 01:11:37 -04:00
Tejun Heo
4f7faa3f2b libata: make EH fail gracefully if no reset method is available
When no reset method is available, libata currently oopses.  Although
the condition can't happen unless there's a bug in a low level driver,
oopsing isn't the best way to report the error condition.  Complain,
dump stack and fail reset instead.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-17 15:44:26 -04:00
Tejun Heo
45db2f6c95 libata: move link onlineness check out of softreset methods
Currently, SATA softresets should do link onlineness check before
actually performing SRST protocol but it doesn't really belong to
softreset.

This patch moves onlineness check in softreset to ata_eh_reset() and
ata_eh_followup_srst_needed() to clean up code and help future sata_mv
changes which need clear separation between SCR and TF accesses.

sata_fsl is peculiar in that its softreset really isn't softreset but
combination of hardreset and softreset.  This patch adds dummy private
->prereset to keep the current behavior but the driver really should
implement separate hard and soft resets and return -EAGAIN from
hardreset if it should be follwed by softreset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-17 15:44:25 -04:00
Tejun Heo
2a0c15ca39 libata: kill dead code paths in reset path
Some code paths which had been made obsolete by recent reset
simplification were still around.  Kill them.

* ata_eh_reset() checked for ATA_DEV_UNKNOWN to determine
  classification failure.  This is no longer applicable.

* ata_do_reset() should convert ATA_DEV_UNKNOWN to ATA_DEV_NONE
  regardless of reset result (e.g. -EAGAIN).

* LLDs don't need to convert ATA_DEV_UNKNOWN to ATA_DEV_NONE.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-04-17 15:44:25 -04:00
Tejun Heo
071f44b1d2 libata: implement PMP helpers
Implement helpers to test whether PMP is supported, attached and
determine pmp number to use when issuing SRST to a link.  While at it,
move ata_is_host_link() so that it's together with the two new PMP
helpers.

This change simplifies LLDs and helps making PMP support optional.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:25 -04:00
Tejun Heo
305d2a1ab1 libata: unify mechanism to request follow-up SRST
Previously, there were two ways to trigger follow-up SRST from
hardreset method - returning -EAGAIN and leaving all device classes
unmodified.  Drivers never used the latter mechanism and the only use
case for the former was when hardreset couldn't classify.

Drop the latter mechanism and let -EAGAIN mean "perform follow-up SRST
if classification is required".  This change removes unnecessary
follow-up SRSTs and simplifies reset implementations.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:23 -04:00
Tejun Heo
5958e3025f libata: move PMP SCR access failure during reset to ata_eh_reset()
If PMP fan-out reset fails and SCR isn't accessible, PMP should be
reset.  This used to be tested by sata_pmp_std_hardreset() and
communicated to EH by -ERESTART.  However, this logic is generic and
doesn't really have much to do with specific hardreset implementation.

This patch moves SCR access failure detection logic to ata_eh_reset()
where it belongs.  As this makes sata_pmp_std_hardreset() identical to
sata_std_hardreset(), the function is killed and replaced with the
standard method.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:23 -04:00
Tejun Heo
57c9efdfb3 libata: implement and use sata_std_hardreset()
Implement sata_std_hardreset(), which simply wraps around
sata_link_hardreset().  sata_std_hardreset() becomes new standard
hardreset method for sata_port_ops and sata_sff_hardreset() moves from
ata_base_port_ops to ata_sff_port_ops, which is where it really
belongs.

ata_is_builtin_hardreset() is added so that both
ata_std_error_handler() and ata_sff_error_handler() skip both builtin
hardresets if SCR isn't accessible.

piix_sidpr_hardreset() in ata_piix.c is identical to
sata_std_hardreset() in functionality and got replaced with the
standard function.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:23 -04:00
Tejun Heo
9363c3825e libata: rename SFF functions
SFF functions have confusing names.  Some have sff prefix, some have
bmdma, some std, some pci and some none.  Unify the naming by...

* SFF functions which are common to both BMDMA and non-BMDMA are
  prefixed with ata_sff_.

* SFF functions which are specific to BMDMA are prefixed with
  ata_bmdma_.

* SFF functions which are specific to PCI but apply to both BMDMA and
  non-BMDMA are prefixed with ata_pci_sff_.

* SFF functions which are specific to PCI and BMDMA are prefixed with
  ata_pci_bmdma_.

* Drop generic prefixes from LLD specific routines.  For example,
  bfin_std_dev_select -> bfin_dev_select.

The following renames are noteworthy.

  ata_qc_issue_prot() -> ata_sff_qc_issue()
  ata_pci_default_filter() -> ata_bmdma_mode_filter()
  ata_dev_try_classify() -> ata_sff_dev_classify()

This rename is in preparation of separating SFF support out of libata
core layer.  This patch strictly renames functions and doesn't
introduce any behavior difference.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:21 -04:00
Tejun Heo
03faab7827 libata: implement ATA_QCFLAG_RETRY
Currently whether a command should be retried after failure is
determined inside ata_eh_finish().  Add ATA_QCFLAG_RETRY and move the
logic into ata_eh_autopsy().  This makes things clearer and helps
extending retry determination logic.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-04-17 15:44:20 -04:00
Tejun Heo
a1efdaba2d libata: make reset related methods proper port operations
Currently reset methods are not specified directly in the
ata_port_operations table.  If a LLD wants to use custom reset
methods, it should construct and use a error_handler which uses those
reset methods.  It's done this way for two reasons.

First, the ops table already contained too many methods and adding
four more of them would noticeably increase the amount of necessary
boilerplate code all over low level drivers.

Second, as ->error_handler uses those reset methods, it can get
confusing.  ie. By overriding ->error_handler, those reset ops can be
made useless making layering a bit hazy.

Now that ops table uses inheritance, the first problem doesn't exist
anymore.  The second isn't completely solved but is relieved by
providing default values - most drivers can just override what it has
implemented and don't have to concern itself about higher level
callbacks.  In fact, there currently is no driver which actually
modifies error handling behavior.  Drivers which override
->error_handler just wraps the standard error handler only to prepare
the controller for EH.  I don't think making ops layering strict has
any noticeable benefit.

This patch makes ->prereset, ->softreset, ->hardreset, ->postreset and
their PMP counterparts propoer ops.  Default ops are provided in the
base ops tables and drivers are converted to override individual reset
methods instead of creating custom error_handler.

* ata_std_error_handler() doesn't use sata_std_hardreset() if SCRs
  aren't accessible.  sata_promise doesn't need to use separate
  error_handlers for PATA and SATA anymore.

* softreset is broken for sata_inic162x and sata_sx4.  As libata now
  always prefers hardreset, this doesn't really matter but the ops are
  forced to NULL using ATA_OP_NULL for documentation purpose.

* pata_hpt374 needs to use different prereset for the first and second
  PCI functions.  This used to be done by branching from
  hpt374_error_handler().  The proper way to do this is to use
  separate ops and port_info tables for each function.  Converted.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:18 -04:00
Tejun Heo
b558edddb1 libata: kill ata_ehi_schedule_probe()
ata_ehi_schedule_probe() was created to hide details of link-resuming
reset magic.  Now that all the softreset workarounds are gone,
scheduling probe is very simple - set probe_mask and request RESET.
Kill ata_ehi_schedule_probe() and open code it.  This also increases
consistency as ata_ehi_schedule_probe() couldn't cover individual
device probings so they were open-coded even when the helper existed.

While at it, define ATA_ALL_DEVICES as mask of all possible devices on
a link and always use it when requesting probe on link level for
simplicity and consistency.  Setting extra bits in the probe_mask
doesn't hurt anybody.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:16 -04:00
Tejun Heo
672b2d65ba libata: kill ATA_EHI_RESUME_LINK
ATA_EHI_RESUME_LINK has two functions - promote reset to hardreset if
ATA_LFLAG_HRST_TO_RESUME is set and preventing EH from shortcutting
reset action when probing is requested.  The former is gone now and
the latter can easily be achieved by making EH to perform at least one
reset if reset is requested, which also makes more sense than
depending on RESUME_LINK flag.

As ATA_EHI_RESUME_LINK was the only EHI reset modifier, this also
kills reset modifier handling.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:16 -04:00
Tejun Heo
cf48062658 libata: prefer hardreset
When both soft and hard resets are available, libata preferred
softreset till now.  The logic behind it was to be softer to devices;
however, this doesn't really help much.  Rationales for the change:

* BIOS may freeze lock certain things during boot and softreset can't
  unlock those.  This by itself is okay but during operation PHY event
  or other error conditions can trigger hardreset and the device may
  end up with different configuration.

  For example, after a hardreset, previously unlockable HPA can be
  unlocked resulting in different device size and thus revalidation
  failure.  Similar condition can occur during or after resume.

* Certain ATAPI devices require hardreset to recover after certain
  error conditions.  On PATA, this is done by issuing the DEVICE RESET
  command.  On SATA, COMRESET has equivalent effect.  The problem is
  that DEVICE RESET needs its own execution protocol.

  For SFF controllers with bare TF access, it can be easily
  implemented but more advanced controllers (e.g. ahci and sata_sil24)
  require specialized implementations.  Simply using hardreset solves
  the problem nicely.

* COMRESET initialization sequence is the norm in SATA land and many
  SATA devices don't work properly if only SRST is used.  For example,
  some PMPs behave this way and libata works around by always issuing
  hardreset if the host supports PMP.

  Like the above example, libata has developed a number of mechanisms
  aiming to promote softreset to hardreset if softreset is not going
  to work.  This approach is time consuming and error prone.

  Also, note that, dependingon how you read the specs, it could be
  argued that PMP fan-out ports require COMRESET to start operation.
  In fact, all the PMPs on the market except one don't work properly
  if COMRESET is not issued to fan-out ports after PMP reset.

* COMRESET is an integral part of SATA connection and any working
  device should be able to handle COMRESET properly.  After all, it's
  the way to signal hardreset during reboot.  This is the most used
  and recommended (at least by the ahci spec) method of resetting
  devices.

So, this patch makes libata prefer hardreset over softreset by making
the following changes.

* Rename ATA_EH_RESET_MASK to ATA_EH_RESET and use it whereever
  ATA_EH_{SOFT|HARD}RESET used to be used.  ATA_EH_{SOFT|HARD}RESET is
  now only used to tell prereset whether soft or hard reset will be
  issued.

* Strip out now unneeded promote-to-hardreset logics from
  ata_eh_reset(), ata_std_prereset(), sata_pmp_std_prereset() and
  other places.

Signed-off-by: Tejun Heo <htejun@gmail.com>
2008-04-17 15:44:15 -04:00
Tejun Heo
3ec25ebd69 libata: ATA_EHI_LPM should be ATA_EH_LPM
EH actions are ATA_EH_* not ATA_EHI_*.  Rename ATA_EHI_LPM to
ATA_EH_LPM.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-03-29 12:21:31 -04:00
Tejun Heo
eec59f76e9 libata: allow LLDs w/o any reset method
Some old SFF controllers don't have any way to reset the channel.
Currently, this isn't supported and libata EH causes an oops.  Allow
LLDs w/o any reset method and just assume ATA class in such cases.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-03-10 20:50:34 -04:00
Tejun Heo
3326732570 libata: implement libata.force module parameter
This patch implements libata.force module parameter which can
selectively override ATA port, link and device configurations
including cable type, SATA PHY SPD limit, transfer mode and NCQ.

For example, you can say "use 1.5Gbps for all fan-out ports attached
to the second port but allow 3.0Gbps for the PMP device itself, oh,
the device attached to the third fan-out port chokes on NCQ and
shouldn't go over UDMA4" by the following.

 libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-02-20 12:12:28 -05:00
Tejun Heo
75f9cafc2d libata: fix off-by-one in error categorization
ATA_ECAT_DUBIOUS_BASE was too high by one and thus all DUBIOUS error
categorizations were wrong.  This passed test because only ATA_BUS and
UNK_DEV were used during testing and the ones after them - ATA_BUS and
an overflowed entry - behaved similarly.

This patch fixes the problem by adding DUBIOUS_NONE category and use
it as base.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:17 -05:00
Tejun Heo
0dc36888d4 libata: rename ATA_PROT_ATAPI_* to ATAPI_PROT_*
ATA_PROT_ATAPI_* are ugly and naming schemes between ATA_PROT_* and
ATA_PROT_ATAPI_* are inconsistent causing confusion.  Rename them to
ATAPI_PROT_* and make them consistent with ATA counterpart.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-01-23 05:24:14 -05:00
Andrew Morton
e6a73ab1c8 drivers/ata/libata-eh.c: fix printk warning
drivers/ata/libata-eh.c: In function `ata_port_pbar_desc':
drivers/ata/libata-eh.c:215: warning: long long unsigned int format, long unsigned int arg (arg 4)

Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:13 -05:00
Jeff Garzik
e39eec13ff [libata] Build fix WRT ata_is_xxx() new API introduction
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-01-23 05:24:11 -05:00
Tejun Heo
76326ac1ac libata: implement fast speed down for unverified data transfer mode
It's very likely that the configured data transfer mode is the wrong
one if device fails data transfers right after initial data transfer
mode configuration (including NCQ on/off and xfermode).  libata EH
needs to speed down fast before upper layers give up on probing.

This patch implement fast speed down rules to handle such cases
better.  Error occured while data transfer hasn't been verified
trigger fast back-to-back speed down actions until data transfer
works.

This change will make cable mis-detection and other initial
configuration problems corrected before partition scanning code gives
up.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:11 -05:00
Tejun Heo
00115e0f5b libata: implement ATA_DFLAG_DUBIOUS_XFER
ATA_DFLAG_DUBIOUS_XFER is set whenever data transfer speed or method
changes and gets cleared when data transfer command succeeds in the
newly configured transfer mode.

This will be used to improve speed down logic.

Signed-off-by: Tejun Heo <htejun@gmail.com<
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:11 -05:00
Tejun Heo
663f99b86a libata: adjust speed down rules
Speed down rules were too conservative.  Adjust them a bit.

* More than 10 timeouts can't happen in 5 minutes as command timeout
  is 30secs.  Lower the limit for rule #1 to 6.

* 10 timeouts is too high for rule #3 too.  Lower it to 6.

* SATAPI can benefit from falling back to PIO too.  Allow SATAPI
  devices to fall back to PIO.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:11 -05:00
Tejun Heo
3884f7b0a8 libata: clean up EH speed down implementation
Clean up EH speed down implementation.

* is_io boolean variable is replaced eflags.  is_io is ATA_EFLAG_IS_IO.

* Error categories now have names.

* Better comments.

* Reorder 5min and 10min rules in ata_eh_speed_down_verdict()

* Use local variable @link to cache @dev->link in ata_eh_speed_down()

These changes are to improve readability and ease further changes.
This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:10 -05:00
Tejun Heo
6f1d1e3a03 libata: move ata_set_mode() to libata-eh.c
Move ata_set_mode() to libata-eh.c.  ata_set_mode() is surely an EH
action and will be more tightly coupled with the rest of error
handling.  Move it to libata-eh.c.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:10 -05:00
Tejun Heo
02c05a27e8 libata: factor out ata_eh_schedule_probe()
Factor out ata_eh_schedule_probe() from ata_eh_handle_dev_fail() and
ata_eh_recover().  This is to improve maintainability and make future
changes easier.

In the previous revision, ata_dev_enabled() test was accidentally
dropped while factoring out.  This problem was spotted by Bartlomiej.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-23 05:24:10 -05:00
Shaohua Li
bd3adca52b libata-acpi: add ACPI _PSx method
ACPI spec (ver 3.0a, p289) requires IDE power on/off executes ACPI _PSx
methods. As recently most PATA drivers use libata, this patch adds _PSx
method support in libata. ACPI spec doesn't mention if SATA requires the
same _PSx method.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-01-23 05:24:09 -05:00
Tejun Heo
4ccd3329a2 libata: don't normalize UNKNOWN to NONE after reset
After non-classifying reset, ehc->classes[] could contain
ATA_DEV_UNKNOWN which used to be normalized to ATA_DEV_NONE for
consistency.  However, this causes unfortunate side effect for drivers
which have non-classifying hardresets (e.g. sata_nv) by making
hardreset report ATA_DEV_NONE for non-classifying resets and thus
makes EH believe that the port is unoccupied and recovery can be
skipped.  The end result is that after a device is swapped with
another one, the new device isn't attached after the old one is
detached.

This patch makes ata_eh_reset() not normalize UNKNOWN to NONE after
non-classifying resets.  This fixes the above problem.  As UNKNOWN and
NONE are handled differently by only EH hotplug logic, this doesn't
cause other behavior changes.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-10 16:53:22 -05:00
Tejun Heo
2695e36616 libata-pmp: propagate timeout to host link
Timeout on downstream command may indicate transmission problem on
host link.  Propagate timeouts to host link.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-10 16:53:16 -05:00
Tejun Heo
f2dfc1a12b libata: update atapi_eh_request_sense() such that lbam/lbah contains buffer size
While updating lbam/h for ATAPI commands, atapi_eh_request_sense() was
left out.  Update it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-17 20:33:15 -05:00
Tejun Heo
abb6a88974 libata: report protocol and full CDB on error
Protocol and CDB allocation size field are important in determining
what went wrong with ATAPI commands.  Report them on failure.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-12-01 17:35:58 -05:00
Adrian Bunk
21bef6dd2b libata: remove unused functions
This patch removes the following obsolete functions:
- libata-core.c: __sata_phy_reset()
- libata-core.c: sata_phy_reset()
- libata-eh.c: ata_qc_timeout()
- libata-eh.c: ata_eng_timeout()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tejun Heo <htejun@gmail.com>
2007-11-19 12:28:09 +09:00
Tejun Heo
dfcc173d71 libata: consider errors not associated with commands for speed down
libata EH used to ignore errors not associated with commands when
determining whether speed down is necessary or not.  This leads to the
following problems.

* Errors not associated with commands can occur indefinitely without
  libata EH taking corrective actions.

* Upstream link errors don't trigger speed down when PMP is attached
  to it and commands issued to downstream device trigger errors on the
  upstream link.

This patch makes ata_eh_link_autopsy() consider errors not associated
with command for speed down.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-03 08:47:26 -04:00
Tejun Heo
08cf69d005 libata: more robust reset failure handling
Reset failure is a critical error.  It results in disabling the link
requiring user intervention to re-enable it.  Make reset failure
handling more robust such that libata EH doesn't give up too early.

* Temporary glitches during hardreset may lead to classification
  failure when there's no softreset available.  Retry instead of
  giving up.

* Initial softreset or follow up softreset may fail classification.
  Move classification error handling block out of followup softreset
  block such that both cases are handled and retry instead of giving
  up.  Also, on the last try, give ATA class a blind shot.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-03 08:46:54 -04:00
Tejun Heo
416dc9ed20 libata: cosmetic clean up / reorganization of ata_eh_reset()
Clean up and reorganize ata_eh_reset() to ease further changes.

* Cache ARRAY_SIZE(ata_eh_reset_timeouts) in @max_tries.
* Cache link->flags in @lflags.
* Move failure handling block to the end of the function and unnest
  both success and failure handling blocks.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-03 08:46:54 -04:00
Tejun Heo
cd955463bb libata: fix timing computation in ata_eh_reset()
As jiffies changes asynchronously, it needs to be cached if unchanging
timestamp is needed.  The code in ata_eh_reset() intended to do that
with @now but never actually did it.  Fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-03 08:46:54 -04:00
Tejun Heo
e027bd36c1 libata: implement and use ATA_QCFLAG_QUIET
Implement ATA_QCFLAG_QUIET which indicates that there's no need to
report if the command fails with AC_ERR_DEV and set it for passthrough
commands.

Combined with previous changes, this now makes device errors for all
direct commands reported directly to the issuer without going through
EH actions and reporting.

Note that EH is still invoked after non-IO device errors to determine
the nature of the error and resume command execution (some controller
requires special care after error to continue).  It just performs
default maintenance after error, examines what's going on, realizes
that it's none of its business and reports the command failure without
logging any error messages.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-30 09:59:43 -04:00
Tejun Heo
f90f0828e5 libata: stop being overjealous about non-IO commands
libata EH always revalidated device and retried failed command after
error except for ATAPI CCs.  This is unnecessary and hinders with
users issuing direct commands.  This patch makes the following
changes.

* Make sata_sil24 not request ATA_EH_REVALIDATE on device errors.
  sil24 is the only driver which does this.  All others let libata EH
  core code decide.

* Don't request revalidation after device error of non-IO command.
  Revalidation doesn't really help anybody.  As ATA_EH_REVALIDATE
  isn't set by default, there's no reason to clear it after sense data
  is read.  Kill ATA_EH_REVALIDATE clearing code while at it.

* Don't retry non-IO command after device error.  Device has rejected
  the command.  There's no point in retrying.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-30 09:59:42 -04:00
Kristen Carlson Accardi
ca77329fb7 [libata] Link power management infrastructure
Device Initiated Power Management, which is defined
in SATA 2.5 can be enabled for disks which support it.
This patch enables DIPM when the user sets the link
power management policy to "min_power".

Additionally, libata drivers can define a function
(enable_pm) that will perform hardware specific actions to
enable whatever power management policy the user set up
for Host Initiated Power management (HIPM).
This power management policy will be activated after all
disks have been enumerated and intialized.  Drivers should
also define disable_pm, which will turn off link power
management, but not change link power management policy.

Documentation/scsi/link_power_management_policy.txt has additional
information.

Signed-off-by:  Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2007-10-29 11:00:35 -04:00
Tejun Heo
4fb4615bc9 libata: no need to speed down if already at PIO0
After reset, transfer mode is always PIO0 regardless of
dev->xfer_mask.  Check dev->pio_mode before trying to slow down after
configuration failure.  This prevents bogus speed down before device
is actually configured.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-29 06:21:33 -04:00
Tejun Heo
cdeab11407 libata: relocate forcing PIO0 on reset
Forcing PIO0 on reset was done inside ata_bus_softreset(), which is a
bit out of place as it should be applied to all resets - hard, soft
and implementation which don't use ata_bus_softreset().  Relocate it
such that...

* For new EH, it's done in ata_eh_reset() before calling prereset.

* For old EH, it's done before calling ap->ops->phy_reset() in
  ata_bus_probe().

This makes PIO0 forced after all resets.  Another difference is that
reset itself is done after PIO0 is forced.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-29 06:21:33 -04:00
Tejun Heo
054a5fbace libata: track SLEEP state and issue SRST to wake it up
ATA devices in SLEEP mode don't respond to any commands.  SRST is
necessary to wake it up.  Till now, when a command is issued to a
device in SLEEP mode, the command times out, which makes EH reset the
device and retry the command after that, causing a long delay.

This patch makes libata track SLEEP state and issue SRST automatically
if a command is about to be issued to a device in SLEEP.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Bruce Allen <ballen@gravity.phys.uwm.edu>
Cc: Andrew Paprocki <andrew@ishiboo.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-29 06:15:25 -04:00
Tejun Heo
0e06d9ce7a libata: cosmetic clean up in ata_eh_reset()
Local variable @action usage in ata_eh_reset() is a bit confusing.
It's used only to cache ehc->i.action to test reset masks after
clearing it; however, due to the generic name "action", it's easy to
misinterpret the local variable as containing the selected reset
method later.  Also, the reason for caching the original value is easy
to miss.

This patch renames @action to @tmp_action and make it buffer newly
selected value instead to improve readability.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-25 02:02:02 -04:00
Jeff Garzik
2dcb407e61 [libata] checkpatch-inspired cleanups
Tackle the relatively sane complaints of checkpatch --file.

The vast majority is indentation and whitespace changes, the rest are

* #include fixes
* printk KERN_xxx prefix addition
* BSS/initializer cleanups

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2007-10-23 20:59:42 -04:00
Jeff Garzik
2855568b1e [libata] struct pci_dev related cleanups
* remove pointless pci_dev_to_dev() wrapper.  Just directly reference
  the embedded struct device like everyone else does.

* pata_cs5520: delete cs5520_remove_one(), it was a duplicate of
  ata_pci_remove_one()

* linux/libata.h: don't bother including linux/pci.h, we don't need it.
  Simply declare 'struct pci_dev' and assume interested parties will
  include the header, as they should be doing anyway.

* linux/libata.h: consolidate all CONFIG_PCI declarations into a
  single location in the header.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2007-10-12 14:55:47 -04:00
Tejun Heo
b06ce3e51e libata: use ata_exec_internal() for PMP register access
PMP registers used to be accessed with dedicated accessors ->pmp_read
and ->pmp_write.  During reset, those callbacks are called with the
port frozen so they should be able to run without depending on
interrupt delivery.  To achieve this, they were implemented polling.

However, as resetting the host port makes the PMP to isolate fan-out
ports until SError.X is cleared, resetting fan-out ports while port is
frozen doesn't buy much additional safety.

This patch updates libata PMP support such that PMP registers are
accessed using regular ata_exec_internal() mechanism and kills
->pmp_read/write() callbacks.  The following changes are made.

* PMP access helpers - sata_pmp_read_init_tf(), sata_pmp_read_val(),
  sata_pmp_write_init_tf() are folded into sata_pmp_read/write() which
  are now standalone PMP register access functions.

* sata_pmp_read/write() returns err_mask instead of rc.  This is
  consistent with other functions which issue internal commands and
  allows more detailed error reporting.

* ahci interrupt handler is modified to ignore BAD_PMP and
  spurious/illegal completion IRQs while reset is in progress.  These
  conditions are expected during reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:47 -04:00
Tejun Heo
afaa5c373d libata: implement ATA_PFLAG_RESETTING
Implement ATA_PFLAG_RESETTING.  This flag is set while reset is in
progress.  It's set before prereset is called and cleared after reset
fails or postreset is finished.

This flag itself doesn't have any function.  It will be used by LLDs
to tell whether reset is in progress if it needs to behave differently
during reset.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:47 -04:00
Tejun Heo
2b789108fc libata: add @timeout to ata_exec_internal[_sg]()
Add @timeout argument to ata_exec_internal[_sg]().  If 0, default
timeout ata_probe_timeout is used.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:47 -04:00
Tejun Heo
9073868376 libata: wrap schedule_timeout_uninterruptible() in loop
Tasks in uninterruptible sleep might be woken up by unrelated events
and should check whether the condition it was waiting for has actually
triggered.  Wrap schedule_timeout_uninterruptible() in loop to achieve
it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:46 -04:00
Tejun Heo
94ff3d5408 libata: skip suppress reporting if ATA_EHI_QUIET
ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used during initial probing
to skip exception analysis and reporting.  Usually, there's nothing to
report but on some allowed but rare corner cases (e.g. phy status
changed interrupt when IRQ is enabled on frozen port - this happens if
IRQ pending status isn't cleared in the IRQ router or controller)
exception messages get printed.

Skip reporting if ATA_EHI_QUIET is set.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:46 -04:00
Robert Hancock
1333e19434 libata: add human-readable error value decoding
This adds human-readable decoding of the ATA status and error registers
(similar to what drivers/ide does) as well as the SATA Serror register
to libata error handling output.  This prevents the need to pore
through standards documents to figure out the meaning of the bits
in these registers when looking at error reports.  Some bits that
drivers/ide decoded are not decoded here, since the bits are either
command-dependent or obsolete, and properly parsing them would add
too much complexity.

Signed-off-by: Robert Hancock <hancockr@shaw.ca>

[edited slightly to make output a bit more symmetric]
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:45 -04:00
Tejun Heo
633273a3ed libata-pmp: hook PMP support and enable it
Hook PMP support into libata and enable it.  Connect SCR and probing
functions, and update ata_dev_classify() to detect PMP.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-12 14:55:44 -04:00