X-Git-Url: https://err.no/cgi-bin/gitweb.cgi?a=blobdiff_plain;ds=sidebyside;f=Documentation%2FDocBook%2Flibata.tmpl;h=e97c3231454173e938f9554929420c441683a9de;hb=ea6e1e94f2cb9ae54bd1428e1ef3e84a749ceed8;hp=b2ec780bcda10da1f835de1527331d054afe592a;hpb=0fbbbf2bde4da5cb01a949c3d7b21c0627f520a8;p=linux-2.6 diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index b2ec780bcd..e97c323145 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -120,14 +120,27 @@ void (*dev_config) (struct ata_port *, struct ata_device *); void (*set_piomode) (struct ata_port *, struct ata_device *); void (*set_dmamode) (struct ata_port *, struct ata_device *); -void (*post_set_mode) (struct ata_port *ap); +void (*post_set_mode) (struct ata_port *); +unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int); Hooks called prior to the issue of SET FEATURES - XFER MODE - command. dev->pio_mode is guaranteed to be valid when - ->set_piomode() is called, and dev->dma_mode is guaranteed to be - valid when ->set_dmamode() is called. ->post_set_mode() is + command. The optional ->mode_filter() hook is called when libata + has built a mask of the possible modes. This is passed to the + ->mode_filter() function which should return a mask of valid modes + after filtering those unsuitable due to hardware limits. It is not + valid to use this interface to add modes. + + + dev->pio_mode and dev->dma_mode are guaranteed to be valid when + ->set_piomode() and when ->set_dmamode() is called. The timings for + any other drive sharing the cable will also be valid at this point. + That is the library records the decisions for the modes of each + drive on a channel before it attempts to set any of them. + + + ->post_set_mode() is called unconditionally, after the SET FEATURES - XFER MODE command completes successfully. @@ -156,6 +169,22 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); + PIO data read/write + +void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int); + + + +All bmdma-style drivers must implement this hook. This is the low-level +operation that actually copies the data bytes during a PIO data +transfer. +Typically the driver +will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or +ata_mmio_data_xfer(). + + + + ATA command execute void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); @@ -191,11 +220,10 @@ command. u8 (*check_status)(struct ata_port *ap); u8 (*check_altstatus)(struct ata_port *ap); -u8 (*check_err)(struct ata_port *ap); - Reads the Status/AltStatus/Error ATA shadow register from + Reads the Status/AltStatus ATA shadow register from hardware. On some hardware, reading the Status register has the side effect of clearing the interrupt condition. Most drivers for taskfile-based hardware use @@ -230,21 +258,30 @@ void (*dev_select)(struct ata_port *ap, unsigned int device); - Reset ATA bus + Private tuning method -void (*phy_reset) (struct ata_port *ap); +void (*set_mode) (struct ata_port *ap); - The very first step in the probe phase. Actions vary depending - on the bus type, typically. After waking up the device and probing - for device presence (PATA and SATA), typically a soft reset - (SRST) will be performed. Drivers typically use the helper - functions ata_bus_reset() or sata_phy_reset() for this hook. - Many SATA drivers use sata_phy_reset() or call it from within - their own phy_reset() functions. + By default libata performs drive and controller tuning in + accordance with the ATA timing rules and also applies blacklists + and cable limits. Some controllers need special handling and have + custom tuning rules, typically raid controllers that use ATA + commands but do not actually do drive timing. + + + This hook should not be used to replace the standard controller + tuning logic when a controller has quirks. Replacing the default + tuning logic in that case would bypass handling for drive and + bridge quirks that may be important to data reliability. If a + controller needs to filter the mode selection it should use the + mode_filter hook instead. + + + Control PCI IDE BMDMA engine @@ -315,16 +352,74 @@ int (*qc_issue) (struct ata_queued_cmd *qc); - Timeout (error) handling + Exception and probe handling (EH) void (*eng_timeout) (struct ata_port *ap); +void (*phy_reset) (struct ata_port *ap); + + + +Deprecated. Use ->error_handler() instead. + + + +void (*freeze) (struct ata_port *ap); +void (*thaw) (struct ata_port *ap); + + + +ata_port_freeze() is called when HSM violations or some other +condition disrupts normal operation of the port. A frozen port +is not allowed to perform any operation until the port is +thawed, which usually follows a successful reset. + + + +The optional ->freeze() callback can be used for freezing the port +hardware-wise (e.g. mask interrupt and stop DMA engine). If a +port cannot be frozen hardware-wise, the interrupt handler +must ack and clear interrupts unconditionally while the port +is frozen. + + +The optional ->thaw() callback is called to perform the opposite of ->freeze(): +prepare the port for normal operation once again. Unmask interrupts, +start DMA engine, etc. + + + +void (*error_handler) (struct ata_port *ap); -This is a high level error handling function, called from the -error handling thread, when a command times out. Most newer -hardware will implement its own error handling code here. IDE BMDMA -drivers may use the helper function ata_eng_timeout(). +->error_handler() is a driver's hook into probe, hotplug, and recovery +and other exceptional conditions. The primary responsibility of an +implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set +of EH hooks as arguments: + + + +'prereset' hook (may be NULL) is called during an EH reset, before any other actions +are taken. + + + +'postreset' hook (may be NULL) is called after the EH reset is performed. Based on +existing conditions, severity of the problem, and hardware capabilities, + + + +Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be +called to perform the low-level EH reset. + + + +void (*post_internal_cmd) (struct ata_queued_cmd *qc); + + + +Perform any hardware-specific actions necessary to finish processing +after executing a probe-time or EH-time command via ata_exec_internal(). @@ -666,7 +761,7 @@ and other resources, etc. ata_scsi_error() - ata_scsi_error() is the current hostt->eh_strategy_handler() + ata_scsi_error() is the current transportt->eh_strategy_handler() for libata. As discussed above, this will be entered in two cases - timeout and ATAPI error completion. This function calls low level libata driver's eng_timeout() callback, the @@ -787,6 +882,722 @@ and other resources, etc. !Idrivers/scsi/libata-scsi.c + + ATA errors & exceptions + + + This chapter tries to identify what error/exception conditions exist + for ATA/ATAPI devices and describe how they should be handled in + implementation-neutral way. + + + + The term 'error' is used to describe conditions where either an + explicit error condition is reported from device or a command has + timed out. + + + + The term 'exception' is either used to describe exceptional + conditions which are not errors (say, power or hotplug events), or + to describe both errors and non-error exceptional conditions. Where + explicit distinction between error and exception is necessary, the + term 'non-error exception' is used. + + + + Exception categories + + Exceptions are described primarily with respect to legacy + taskfile + bus master IDE interface. If a controller provides + other better mechanism for error reporting, mapping those into + categories described below shouldn't be difficult. + + + + In the following sections, two recovery actions - reset and + reconfiguring transport - are mentioned. These are described + further in . + + + + HSM violation + + This error is indicated when STATUS value doesn't match HSM + requirement during issuing or excution any ATA/ATAPI command. + + + + Examples + + + + ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying + to issue a command. + + + + + + !BSY && !DRQ during PIO data transfer. + + + + + + DRQ on command completion. + + + + + + !BSY && ERR after CDB tranfer starts but before the + last byte of CDB is transferred. ATA/ATAPI standard states + that "The device shall not terminate the PACKET command + with an error before the last byte of the command packet has + been written" in the error outputs description of PACKET + command and the state diagram doesn't include such + transitions. + + + + + + + In these cases, HSM is violated and not much information + regarding the error can be acquired from STATUS or ERROR + register. IOW, this error can be anything - driver bug, + faulty device, controller and/or cable. + + + + As HSM is violated, reset is necessary to restore known state. + Reconfiguring transport for lower speed might be helpful too + as transmission errors sometimes cause this kind of errors. + + + + + ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) + + + These are errors detected and reported by ATA/ATAPI devices + indicating device problems. For this type of errors, STATUS + and ERROR register values are valid and describe error + condition. Note that some of ATA bus errors are detected by + ATA/ATAPI devices and reported using the same mechanism as + device errors. Those cases are described later in this + section. + + + + For ATA commands, this type of errors are indicated by !BSY + && ERR during command execution and on completion. + + + For ATAPI commands, + + + + + + !BSY && ERR && ABRT right after issuing PACKET + indicates that PACKET command is not supported and falls in + this category. + + + + + + !BSY && ERR(==CHK) && !ABRT after the last + byte of CDB is transferred indicates CHECK CONDITION and + doesn't fall in this category. + + + + + + !BSY && ERR(==CHK) && ABRT after the last byte + of CDB is transferred *probably* indicates CHECK CONDITION and + doesn't fall in this category. + + + + + + + Of errors detected as above, the followings are not ATA/ATAPI + device errors but ATA bus errors and should be handled + according to . + + + + + + CRC error during data transfer + + + This is indicated by ICRC bit in the ERROR register and + means that corruption occurred during data transfer. Upto + ATA/ATAPI-7, the standard specifies that this bit is only + applicable to UDMA transfers but ATA/ATAPI-8 draft revision + 1f says that the bit may be applicable to multiword DMA and + PIO. + + + + + + ABRT error during data transfer or on completion + + + Upto ATA/ATAPI-7, the standard specifies that ABRT could be + set on ICRC errors and on cases where a device is not able + to complete a command. Combined with the fact that MWDMA + and PIO transfer errors aren't allowed to use ICRC bit upto + ATA/ATAPI-7, it seems to imply that ABRT bit alone could + indicate tranfer errors. + + + However, ATA/ATAPI-8 draft revision 1f removes the part + that ICRC errors can turn on ABRT. So, this is kind of + gray area. Some heuristics are needed here. + + + + + + + + ATA/ATAPI device errors can be further categorized as follows. + + + + + + Media errors + + + This is indicated by UNC bit in the ERROR register. ATA + devices reports UNC error only after certain number of + retries cannot recover the data, so there's nothing much + else to do other than notifying upper layer. + + + READ and WRITE commands report CHS or LBA of the first + failed sector but ATA/ATAPI standard specifies that the + amount of transferred data on error completion is + indeterminate, so we cannot assume that sectors preceding + the failed sector have been transferred and thus cannot + complete those sectors successfully as SCSI does. + + + + + + Media changed / media change requested error + + + <<TODO: fill here>> + + + + + Address error + + + This is indicated by IDNF bit in the ERROR register. + Report to upper layer. + + + + + Other errors + + + This can be invalid command or parameter indicated by ABRT + ERROR bit or some other error condition. Note that ABRT + bit can indicate a lot of things including ICRC and Address + errors. Heuristics needed. + + + + + + + + Depending on commands, not all STATUS/ERROR bits are + applicable. These non-applicable bits are marked with + "na" in the output descriptions but upto ATA/ATAPI-7 + no definition of "na" can be found. However, + ATA/ATAPI-8 draft revision 1f describes "N/A" as + follows. + + +
+ + 3.2.3.3a N/A + + + A keyword the indicates a field has no defined value in + this standard and should not be checked by the host or + device. N/A fields should be cleared to zero. + + + + +
+ + + So, it seems reasonable to assume that "na" bits are + cleared to zero by devices and thus need no explicit masking. + + +
+ + + ATAPI device CHECK CONDITION + + + ATAPI device CHECK CONDITION error is indicated by set CHK bit + (ERR bit) in the STATUS register after the last byte of CDB is + transferred for a PACKET command. For this kind of errors, + sense data should be acquired to gather information regarding + the errors. REQUEST SENSE packet command should be used to + acquire sense data. + + + + Once sense data is acquired, this type of errors can be + handled similary to other SCSI errors. Note that sense data + may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR + && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such + cases, the error should be considered as an ATA bus error and + handled according to . + + + + + + ATA device error (NCQ) + + + NCQ command error is indicated by cleared BSY and set ERR bit + during NCQ command phase (one or more NCQ commands + outstanding). Although STATUS and ERROR registers will + contain valid values describing the error, READ LOG EXT is + required to clear the error condition, determine which command + has failed and acquire more information. + + + + READ LOG EXT Log Page 10h reports which tag has failed and + taskfile register values describing the error. With this + information the failed command can be handled as a normal ATA + command error as in and all + other in-flight commands must be retried. Note that this + retry should not be counted - it's likely that commands + retried this way would have completed normally if it were not + for the failed command. + + + + Note that ATA bus errors can be reported as ATA device NCQ + errors. This should be handled as described in . + + + + If READ LOG EXT Log Page 10h fails or reports NQ, we're + thoroughly screwed. This condition should be treated + according to . + + + + + + ATA bus error + + + ATA bus error means that data corruption occurred during + transmission over ATA bus (SATA or PATA). This type of errors + can be indicated by + + + + + + + ICRC or ABRT error as described in . + + + + + + Controller-specific error completion with error information + indicating transmission error. + + + + + + On some controllers, command timeout. In this case, there may + be a mechanism to determine that the timeout is due to + transmission error. + + + + + + Unknown/random errors, timeouts and all sorts of weirdities. + + + + + + + As described above, transmission errors can cause wide variety + of symptoms ranging from device ICRC error to random device + lockup, and, for many cases, there is no way to tell if an + error condition is due to transmission error or not; + therefore, it's necessary to employ some kind of heuristic + when dealing with errors and timeouts. For example, + encountering repetitive ABRT errors for known supported + command is likely to indicate ATA bus error. + + + + Once it's determined that ATA bus errors have possibly + occurred, lowering ATA bus transmission speed is one of + actions which may alleviate the problem. See for more information. + + + + + + PCI bus error + + + Data corruption or other failures during transmission over PCI + (or other system bus). For standard BMDMA, this is indicated + by Error bit in the BMDMA Status register. This type of + errors must be logged as it indicates something is very wrong + with the system. Resetting host controller is recommended. + + + + + + Late completion + + + This occurs when timeout occurs and the timeout handler finds + out that the timed out command has completed successfully or + with error. This is usually caused by lost interrupts. This + type of errors must be logged. Resetting host controller is + recommended. + + + + + + Unknown error (timeout) + + + This is when timeout occurs and the command is still + processing or the host and device are in unknown state. When + this occurs, HSM could be in any valid or invalid state. To + bring the device to known state and make it forget about the + timed out command, resetting is necessary. The timed out + command may be retried. + + + + Timeouts can also be caused by transmission errors. Refer to + for more details. + + + + + + Hotplug and power management exceptions + + + <<TODO: fill here>> + + + + +
+ + + EH recovery actions + + + This section discusses several important recovery actions. + + + + Clearing error condition + + + Many controllers require its error registers to be cleared by + error handler. Different controllers may have different + requirements. + + + + For SATA, it's strongly recommended to clear at least SError + register during error handling. + + + + + Reset + + + During EH, resetting is necessary in the following cases. + + + + + + + HSM is in unknown or invalid state + + + + + + HBA is in unknown or invalid state + + + + + + EH needs to make HBA/device forget about in-flight commands + + + + + + HBA/device behaves weirdly + + + + + + + Resetting during EH might be a good idea regardless of error + condition to improve EH robustness. Whether to reset both or + either one of HBA and device depends on situation but the + following scheme is recommended. + + + + + + + When it's known that HBA is in ready state but ATA/ATAPI + device in in unknown state, reset only device. + + + + + + If HBA is in unknown state, reset both HBA and device. + + + + + + + HBA resetting is implementation specific. For a controller + complying to taskfile/BMDMA PCI IDE, stopping active DMA + transaction may be sufficient iff BMDMA state is the only HBA + context. But even mostly taskfile/BMDMA PCI IDE complying + controllers may have implementation specific requirements and + mechanism to reset themselves. This must be addressed by + specific drivers. + + + + OTOH, ATA/ATAPI standard describes in detail ways to reset + ATA/ATAPI devices. + + + + + PATA hardware reset + + + This is hardware initiated device reset signalled with + asserted PATA RESET- signal. There is no standard way to + initiate hardware reset from software although some + hardware provides registers that allow driver to directly + tweak the RESET- signal. + + + + + Software reset + + + This is achieved by turning CONTROL SRST bit on for at + least 5us. Both PATA and SATA support it but, in case of + SATA, this may require controller-specific support as the + second Register FIS to clear SRST should be transmitted + while BSY bit is still set. Note that on PATA, this resets + both master and slave devices on a channel. + + + + + EXECUTE DEVICE DIAGNOSTIC command + + + Although ATA/ATAPI standard doesn't describe exactly, EDD + implies some level of resetting, possibly similar level + with software reset. Host-side EDD protocol can be handled + with normal command processing and most SATA controllers + should be able to handle EDD's just like other commands. + As in software reset, EDD affects both devices on a PATA + bus. + + + Although EDD does reset devices, this doesn't suit error + handling as EDD cannot be issued while BSY is set and it's + unclear how it will act when device is in unknown/weird + state. + + + + + ATAPI DEVICE RESET command + + + This is very similar to software reset except that reset + can be restricted to the selected device without affecting + the other device sharing the cable. + + + + + SATA phy reset + + + This is the preferred way of resetting a SATA device. In + effect, it's identical to PATA hardware reset. Note that + this can be done with the standard SCR Control register. + As such, it's usually easier to implement than software + reset. + + + + + + + + One more thing to consider when resetting devices is that + resetting clears certain configuration parameters and they + need to be set to their previous or newly adjusted values + after reset. + + + + Parameters affected are. + + + + + + + CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used) + + + + + + Parameters set with SET FEATURES including transfer mode setting + + + + + + Block count set with SET MULTIPLE MODE + + + + + + Other parameters (SET MAX, MEDIA LOCK...) + + + + + + + ATA/ATAPI standard specifies that some parameters must be + maintained across hardware or software reset, but doesn't + strictly specify all of them. Always reconfiguring needed + parameters after reset is required for robustness. Note that + this also applies when resuming from deep sleep (power-off). + + + + Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / + IDENTIFY PACKET DEVICE is issued after any configuration + parameter is updated or a hardware reset and the result used + for further operation. OS driver is required to implement + revalidation mechanism to support this. + + + + + + Reconfigure transport + + + For both PATA and SATA, a lot of corners are cut for cheap + connectors, cables or controllers and it's quite common to see + high transmission error rate. This can be mitigated by + lowering transmission speed. + + + + The following is a possible scheme Jeff Garzik suggested. + + +
+ + If more than $N (3?) transmission errors happen in 15 minutes, + + + + + if SATA, decrease SATA PHY speed. if speed cannot be decreased, + + + + + decrease UDMA xfer speed. if at UDMA0, switch to PIO4, + + + + + decrease PIO xfer speed. if at PIO3, complain, but continue + + + +
+ +
+ +
+ +
+ ata_piix Internals !Idrivers/scsi/ata_piix.c