X-Git-Url: https://err.no/cgi-bin/gitweb.cgi?a=blobdiff_plain;ds=sidebyside;f=Documentation%2FDocBook%2Flibata.tmpl;h=e97c3231454173e938f9554929420c441683a9de;hb=ea6e1e94f2cb9ae54bd1428e1ef3e84a749ceed8;hp=b2ec780bcda10da1f835de1527331d054afe592a;hpb=0fbbbf2bde4da5cb01a949c3d7b21c0627f520a8;p=linux-2.6
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl
index b2ec780bcd..e97c323145 100644
--- a/Documentation/DocBook/libata.tmpl
+++ b/Documentation/DocBook/libata.tmpl
@@ -120,14 +120,27 @@ void (*dev_config) (struct ata_port *, struct ata_device *);
void (*set_piomode) (struct ata_port *, struct ata_device *);
void (*set_dmamode) (struct ata_port *, struct ata_device *);
-void (*post_set_mode) (struct ata_port *ap);
+void (*post_set_mode) (struct ata_port *);
+unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int);
Hooks called prior to the issue of SET FEATURES - XFER MODE
- command. dev->pio_mode is guaranteed to be valid when
- ->set_piomode() is called, and dev->dma_mode is guaranteed to be
- valid when ->set_dmamode() is called. ->post_set_mode() is
+ command. The optional ->mode_filter() hook is called when libata
+ has built a mask of the possible modes. This is passed to the
+ ->mode_filter() function which should return a mask of valid modes
+ after filtering those unsuitable due to hardware limits. It is not
+ valid to use this interface to add modes.
+
+
+ dev->pio_mode and dev->dma_mode are guaranteed to be valid when
+ ->set_piomode() and when ->set_dmamode() is called. The timings for
+ any other drive sharing the cable will also be valid at this point.
+ That is the library records the decisions for the modes of each
+ drive on a channel before it attempts to set any of them.
+
+
+ ->post_set_mode() is
called unconditionally, after the SET FEATURES - XFER MODE
command completes successfully.
@@ -156,6 +169,22 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
+ PIO data read/write
+
+void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int);
+
+
+
+All bmdma-style drivers must implement this hook. This is the low-level
+operation that actually copies the data bytes during a PIO data
+transfer.
+Typically the driver
+will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or
+ata_mmio_data_xfer().
+
+
+
+
ATA command execute
void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
@@ -191,11 +220,10 @@ command.
u8 (*check_status)(struct ata_port *ap);
u8 (*check_altstatus)(struct ata_port *ap);
-u8 (*check_err)(struct ata_port *ap);
- Reads the Status/AltStatus/Error ATA shadow register from
+ Reads the Status/AltStatus ATA shadow register from
hardware. On some hardware, reading the Status register has
the side effect of clearing the interrupt condition.
Most drivers for taskfile-based hardware use
@@ -230,21 +258,30 @@ void (*dev_select)(struct ata_port *ap, unsigned int device);
- Reset ATA bus
+ Private tuning method
-void (*phy_reset) (struct ata_port *ap);
+void (*set_mode) (struct ata_port *ap);
- The very first step in the probe phase. Actions vary depending
- on the bus type, typically. After waking up the device and probing
- for device presence (PATA and SATA), typically a soft reset
- (SRST) will be performed. Drivers typically use the helper
- functions ata_bus_reset() or sata_phy_reset() for this hook.
- Many SATA drivers use sata_phy_reset() or call it from within
- their own phy_reset() functions.
+ By default libata performs drive and controller tuning in
+ accordance with the ATA timing rules and also applies blacklists
+ and cable limits. Some controllers need special handling and have
+ custom tuning rules, typically raid controllers that use ATA
+ commands but do not actually do drive timing.
+
+
+ This hook should not be used to replace the standard controller
+ tuning logic when a controller has quirks. Replacing the default
+ tuning logic in that case would bypass handling for drive and
+ bridge quirks that may be important to data reliability. If a
+ controller needs to filter the mode selection it should use the
+ mode_filter hook instead.
+
+
+
Control PCI IDE BMDMA engine
@@ -315,16 +352,74 @@ int (*qc_issue) (struct ata_queued_cmd *qc);
- Timeout (error) handling
+ Exception and probe handling (EH)
void (*eng_timeout) (struct ata_port *ap);
+void (*phy_reset) (struct ata_port *ap);
+
+
+
+Deprecated. Use ->error_handler() instead.
+
+
+
+void (*freeze) (struct ata_port *ap);
+void (*thaw) (struct ata_port *ap);
+
+
+
+ata_port_freeze() is called when HSM violations or some other
+condition disrupts normal operation of the port. A frozen port
+is not allowed to perform any operation until the port is
+thawed, which usually follows a successful reset.
+
+
+
+The optional ->freeze() callback can be used for freezing the port
+hardware-wise (e.g. mask interrupt and stop DMA engine). If a
+port cannot be frozen hardware-wise, the interrupt handler
+must ack and clear interrupts unconditionally while the port
+is frozen.
+
+
+The optional ->thaw() callback is called to perform the opposite of ->freeze():
+prepare the port for normal operation once again. Unmask interrupts,
+start DMA engine, etc.
+
+
+
+void (*error_handler) (struct ata_port *ap);
-This is a high level error handling function, called from the
-error handling thread, when a command times out. Most newer
-hardware will implement its own error handling code here. IDE BMDMA
-drivers may use the helper function ata_eng_timeout().
+->error_handler() is a driver's hook into probe, hotplug, and recovery
+and other exceptional conditions. The primary responsibility of an
+implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set
+of EH hooks as arguments:
+
+
+
+'prereset' hook (may be NULL) is called during an EH reset, before any other actions
+are taken.
+
+
+
+'postreset' hook (may be NULL) is called after the EH reset is performed. Based on
+existing conditions, severity of the problem, and hardware capabilities,
+
+
+
+Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be
+called to perform the low-level EH reset.
+
+
+
+void (*post_internal_cmd) (struct ata_queued_cmd *qc);
+
+
+
+Perform any hardware-specific actions necessary to finish processing
+after executing a probe-time or EH-time command via ata_exec_internal().
@@ -666,7 +761,7 @@ and other resources, etc.
ata_scsi_error()
- ata_scsi_error() is the current hostt->eh_strategy_handler()
+ ata_scsi_error() is the current transportt->eh_strategy_handler()
for libata. As discussed above, this will be entered in two
cases - timeout and ATAPI error completion. This function
calls low level libata driver's eng_timeout() callback, the
@@ -787,6 +882,722 @@ and other resources, etc.
!Idrivers/scsi/libata-scsi.c
+
+ ATA errors & exceptions
+
+
+ This chapter tries to identify what error/exception conditions exist
+ for ATA/ATAPI devices and describe how they should be handled in
+ implementation-neutral way.
+
+
+
+ The term 'error' is used to describe conditions where either an
+ explicit error condition is reported from device or a command has
+ timed out.
+
+
+
+ The term 'exception' is either used to describe exceptional
+ conditions which are not errors (say, power or hotplug events), or
+ to describe both errors and non-error exceptional conditions. Where
+ explicit distinction between error and exception is necessary, the
+ term 'non-error exception' is used.
+
+
+
+ Exception categories
+
+ Exceptions are described primarily with respect to legacy
+ taskfile + bus master IDE interface. If a controller provides
+ other better mechanism for error reporting, mapping those into
+ categories described below shouldn't be difficult.
+
+
+
+ In the following sections, two recovery actions - reset and
+ reconfiguring transport - are mentioned. These are described
+ further in .
+
+
+
+ HSM violation
+
+ This error is indicated when STATUS value doesn't match HSM
+ requirement during issuing or excution any ATA/ATAPI command.
+
+
+
+ Examples
+
+
+
+ ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying
+ to issue a command.
+
+
+
+
+
+ !BSY && !DRQ during PIO data transfer.
+
+
+
+
+
+ DRQ on command completion.
+
+
+
+
+
+ !BSY && ERR after CDB tranfer starts but before the
+ last byte of CDB is transferred. ATA/ATAPI standard states
+ that "The device shall not terminate the PACKET command
+ with an error before the last byte of the command packet has
+ been written" in the error outputs description of PACKET
+ command and the state diagram doesn't include such
+ transitions.
+
+
+
+
+
+
+ In these cases, HSM is violated and not much information
+ regarding the error can be acquired from STATUS or ERROR
+ register. IOW, this error can be anything - driver bug,
+ faulty device, controller and/or cable.
+
+
+
+ As HSM is violated, reset is necessary to restore known state.
+ Reconfiguring transport for lower speed might be helpful too
+ as transmission errors sometimes cause this kind of errors.
+
+
+
+
+ ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)
+
+
+ These are errors detected and reported by ATA/ATAPI devices
+ indicating device problems. For this type of errors, STATUS
+ and ERROR register values are valid and describe error
+ condition. Note that some of ATA bus errors are detected by
+ ATA/ATAPI devices and reported using the same mechanism as
+ device errors. Those cases are described later in this
+ section.
+
+
+
+ For ATA commands, this type of errors are indicated by !BSY
+ && ERR during command execution and on completion.
+
+
+ For ATAPI commands,
+
+
+
+
+
+ !BSY && ERR && ABRT right after issuing PACKET
+ indicates that PACKET command is not supported and falls in
+ this category.
+
+
+
+
+
+ !BSY && ERR(==CHK) && !ABRT after the last
+ byte of CDB is transferred indicates CHECK CONDITION and
+ doesn't fall in this category.
+
+
+
+
+
+ !BSY && ERR(==CHK) && ABRT after the last byte
+ of CDB is transferred *probably* indicates CHECK CONDITION and
+ doesn't fall in this category.
+
+
+
+
+
+
+ Of errors detected as above, the followings are not ATA/ATAPI
+ device errors but ATA bus errors and should be handled
+ according to .
+
+
+
+
+
+ CRC error during data transfer
+
+
+ This is indicated by ICRC bit in the ERROR register and
+ means that corruption occurred during data transfer. Upto
+ ATA/ATAPI-7, the standard specifies that this bit is only
+ applicable to UDMA transfers but ATA/ATAPI-8 draft revision
+ 1f says that the bit may be applicable to multiword DMA and
+ PIO.
+
+
+
+
+
+ ABRT error during data transfer or on completion
+
+
+ Upto ATA/ATAPI-7, the standard specifies that ABRT could be
+ set on ICRC errors and on cases where a device is not able
+ to complete a command. Combined with the fact that MWDMA
+ and PIO transfer errors aren't allowed to use ICRC bit upto
+ ATA/ATAPI-7, it seems to imply that ABRT bit alone could
+ indicate tranfer errors.
+
+
+ However, ATA/ATAPI-8 draft revision 1f removes the part
+ that ICRC errors can turn on ABRT. So, this is kind of
+ gray area. Some heuristics are needed here.
+
+
+
+
+
+
+
+ ATA/ATAPI device errors can be further categorized as follows.
+
+
+
+
+
+ Media errors
+
+
+ This is indicated by UNC bit in the ERROR register. ATA
+ devices reports UNC error only after certain number of
+ retries cannot recover the data, so there's nothing much
+ else to do other than notifying upper layer.
+
+
+ READ and WRITE commands report CHS or LBA of the first
+ failed sector but ATA/ATAPI standard specifies that the
+ amount of transferred data on error completion is
+ indeterminate, so we cannot assume that sectors preceding
+ the failed sector have been transferred and thus cannot
+ complete those sectors successfully as SCSI does.
+
+
+
+
+
+ Media changed / media change requested error
+
+
+ <<TODO: fill here>>
+
+
+
+
+ Address error
+
+
+ This is indicated by IDNF bit in the ERROR register.
+ Report to upper layer.
+
+
+
+
+ Other errors
+
+
+ This can be invalid command or parameter indicated by ABRT
+ ERROR bit or some other error condition. Note that ABRT
+ bit can indicate a lot of things including ICRC and Address
+ errors. Heuristics needed.
+
+
+
+
+
+
+
+ Depending on commands, not all STATUS/ERROR bits are
+ applicable. These non-applicable bits are marked with
+ "na" in the output descriptions but upto ATA/ATAPI-7
+ no definition of "na" can be found. However,
+ ATA/ATAPI-8 draft revision 1f describes "N/A" as
+ follows.
+
+
+
+
+ 3.2.3.3a N/A
+
+
+ A keyword the indicates a field has no defined value in
+ this standard and should not be checked by the host or
+ device. N/A fields should be cleared to zero.
+
+
+
+
+
+
+
+ So, it seems reasonable to assume that "na" bits are
+ cleared to zero by devices and thus need no explicit masking.
+
+
+
+
+
+ ATAPI device CHECK CONDITION
+
+
+ ATAPI device CHECK CONDITION error is indicated by set CHK bit
+ (ERR bit) in the STATUS register after the last byte of CDB is
+ transferred for a PACKET command. For this kind of errors,
+ sense data should be acquired to gather information regarding
+ the errors. REQUEST SENSE packet command should be used to
+ acquire sense data.
+
+
+
+ Once sense data is acquired, this type of errors can be
+ handled similary to other SCSI errors. Note that sense data
+ may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR
+ && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such
+ cases, the error should be considered as an ATA bus error and
+ handled according to .
+
+
+
+
+
+ ATA device error (NCQ)
+
+
+ NCQ command error is indicated by cleared BSY and set ERR bit
+ during NCQ command phase (one or more NCQ commands
+ outstanding). Although STATUS and ERROR registers will
+ contain valid values describing the error, READ LOG EXT is
+ required to clear the error condition, determine which command
+ has failed and acquire more information.
+
+
+
+ READ LOG EXT Log Page 10h reports which tag has failed and
+ taskfile register values describing the error. With this
+ information the failed command can be handled as a normal ATA
+ command error as in and all
+ other in-flight commands must be retried. Note that this
+ retry should not be counted - it's likely that commands
+ retried this way would have completed normally if it were not
+ for the failed command.
+
+
+
+ Note that ATA bus errors can be reported as ATA device NCQ
+ errors. This should be handled as described in .
+
+
+
+ If READ LOG EXT Log Page 10h fails or reports NQ, we're
+ thoroughly screwed. This condition should be treated
+ according to .
+
+
+
+
+
+ ATA bus error
+
+
+ ATA bus error means that data corruption occurred during
+ transmission over ATA bus (SATA or PATA). This type of errors
+ can be indicated by
+
+
+
+
+
+
+ ICRC or ABRT error as described in .
+
+
+
+
+
+ Controller-specific error completion with error information
+ indicating transmission error.
+
+
+
+
+
+ On some controllers, command timeout. In this case, there may
+ be a mechanism to determine that the timeout is due to
+ transmission error.
+
+
+
+
+
+ Unknown/random errors, timeouts and all sorts of weirdities.
+
+
+
+
+
+
+ As described above, transmission errors can cause wide variety
+ of symptoms ranging from device ICRC error to random device
+ lockup, and, for many cases, there is no way to tell if an
+ error condition is due to transmission error or not;
+ therefore, it's necessary to employ some kind of heuristic
+ when dealing with errors and timeouts. For example,
+ encountering repetitive ABRT errors for known supported
+ command is likely to indicate ATA bus error.
+
+
+
+ Once it's determined that ATA bus errors have possibly
+ occurred, lowering ATA bus transmission speed is one of
+ actions which may alleviate the problem. See for more information.
+
+
+
+
+
+ PCI bus error
+
+
+ Data corruption or other failures during transmission over PCI
+ (or other system bus). For standard BMDMA, this is indicated
+ by Error bit in the BMDMA Status register. This type of
+ errors must be logged as it indicates something is very wrong
+ with the system. Resetting host controller is recommended.
+
+
+
+
+
+ Late completion
+
+
+ This occurs when timeout occurs and the timeout handler finds
+ out that the timed out command has completed successfully or
+ with error. This is usually caused by lost interrupts. This
+ type of errors must be logged. Resetting host controller is
+ recommended.
+
+
+
+
+
+ Unknown error (timeout)
+
+
+ This is when timeout occurs and the command is still
+ processing or the host and device are in unknown state. When
+ this occurs, HSM could be in any valid or invalid state. To
+ bring the device to known state and make it forget about the
+ timed out command, resetting is necessary. The timed out
+ command may be retried.
+
+
+
+ Timeouts can also be caused by transmission errors. Refer to
+ for more details.
+
+
+
+
+
+ Hotplug and power management exceptions
+
+
+ <<TODO: fill here>>
+
+
+
+
+
+
+
+ EH recovery actions
+
+
+ This section discusses several important recovery actions.
+
+
+
+ Clearing error condition
+
+
+ Many controllers require its error registers to be cleared by
+ error handler. Different controllers may have different
+ requirements.
+
+
+
+ For SATA, it's strongly recommended to clear at least SError
+ register during error handling.
+
+
+
+
+ Reset
+
+
+ During EH, resetting is necessary in the following cases.
+
+
+
+
+
+
+ HSM is in unknown or invalid state
+
+
+
+
+
+ HBA is in unknown or invalid state
+
+
+
+
+
+ EH needs to make HBA/device forget about in-flight commands
+
+
+
+
+
+ HBA/device behaves weirdly
+
+
+
+
+
+
+ Resetting during EH might be a good idea regardless of error
+ condition to improve EH robustness. Whether to reset both or
+ either one of HBA and device depends on situation but the
+ following scheme is recommended.
+
+
+
+
+
+
+ When it's known that HBA is in ready state but ATA/ATAPI
+ device in in unknown state, reset only device.
+
+
+
+
+
+ If HBA is in unknown state, reset both HBA and device.
+
+
+
+
+
+
+ HBA resetting is implementation specific. For a controller
+ complying to taskfile/BMDMA PCI IDE, stopping active DMA
+ transaction may be sufficient iff BMDMA state is the only HBA
+ context. But even mostly taskfile/BMDMA PCI IDE complying
+ controllers may have implementation specific requirements and
+ mechanism to reset themselves. This must be addressed by
+ specific drivers.
+
+
+
+ OTOH, ATA/ATAPI standard describes in detail ways to reset
+ ATA/ATAPI devices.
+
+
+
+
+ PATA hardware reset
+
+
+ This is hardware initiated device reset signalled with
+ asserted PATA RESET- signal. There is no standard way to
+ initiate hardware reset from software although some
+ hardware provides registers that allow driver to directly
+ tweak the RESET- signal.
+
+
+
+
+ Software reset
+
+
+ This is achieved by turning CONTROL SRST bit on for at
+ least 5us. Both PATA and SATA support it but, in case of
+ SATA, this may require controller-specific support as the
+ second Register FIS to clear SRST should be transmitted
+ while BSY bit is still set. Note that on PATA, this resets
+ both master and slave devices on a channel.
+
+
+
+
+ EXECUTE DEVICE DIAGNOSTIC command
+
+
+ Although ATA/ATAPI standard doesn't describe exactly, EDD
+ implies some level of resetting, possibly similar level
+ with software reset. Host-side EDD protocol can be handled
+ with normal command processing and most SATA controllers
+ should be able to handle EDD's just like other commands.
+ As in software reset, EDD affects both devices on a PATA
+ bus.
+
+
+ Although EDD does reset devices, this doesn't suit error
+ handling as EDD cannot be issued while BSY is set and it's
+ unclear how it will act when device is in unknown/weird
+ state.
+
+
+
+
+ ATAPI DEVICE RESET command
+
+
+ This is very similar to software reset except that reset
+ can be restricted to the selected device without affecting
+ the other device sharing the cable.
+
+
+
+
+ SATA phy reset
+
+
+ This is the preferred way of resetting a SATA device. In
+ effect, it's identical to PATA hardware reset. Note that
+ this can be done with the standard SCR Control register.
+ As such, it's usually easier to implement than software
+ reset.
+
+
+
+
+
+
+
+ One more thing to consider when resetting devices is that
+ resetting clears certain configuration parameters and they
+ need to be set to their previous or newly adjusted values
+ after reset.
+
+
+
+ Parameters affected are.
+
+
+
+
+
+
+ CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used)
+
+
+
+
+
+ Parameters set with SET FEATURES including transfer mode setting
+
+
+
+
+
+ Block count set with SET MULTIPLE MODE
+
+
+
+
+
+ Other parameters (SET MAX, MEDIA LOCK...)
+
+
+
+
+
+
+ ATA/ATAPI standard specifies that some parameters must be
+ maintained across hardware or software reset, but doesn't
+ strictly specify all of them. Always reconfiguring needed
+ parameters after reset is required for robustness. Note that
+ this also applies when resuming from deep sleep (power-off).
+
+
+
+ Also, ATA/ATAPI standard requires that IDENTIFY DEVICE /
+ IDENTIFY PACKET DEVICE is issued after any configuration
+ parameter is updated or a hardware reset and the result used
+ for further operation. OS driver is required to implement
+ revalidation mechanism to support this.
+
+
+
+
+
+ Reconfigure transport
+
+
+ For both PATA and SATA, a lot of corners are cut for cheap
+ connectors, cables or controllers and it's quite common to see
+ high transmission error rate. This can be mitigated by
+ lowering transmission speed.
+
+
+
+ The following is a possible scheme Jeff Garzik suggested.
+
+
+
+
+ If more than $N (3?) transmission errors happen in 15 minutes,
+
+
+
+
+ if SATA, decrease SATA PHY speed. if speed cannot be decreased,
+
+
+
+
+ decrease UDMA xfer speed. if at UDMA0, switch to PIO4,
+
+
+
+
+ decrease PIO xfer speed. if at PIO3, complain, but continue
+
+
+
+
+
+
+
+
+
+
+
ata_piix Internals
!Idrivers/scsi/ata_piix.c