- May 31, 2005
-
- Current document maintainer:
- Linas Vepstas <linas@austin.ibm.com>
-
-
-Some PCI bus controllers are able to detect certain "hard" PCI errors
-on the bus, such as parity errors on the data and address busses, as
-well as SERR and PERR errors. These chipsets are then able to disable
-I/O to/from the affected device, so that, for example, a bad DMA
-address doesn't end up corrupting system memory. These same chipsets
-are also able to reset the affected PCI device, and return it to
-working condition. This document describes a generic API form
-performing error recovery.
-
-The core idea is that after a PCI error has been detected, there must
-be a way for the kernel to coordinate with all affected device drivers
-so that the pci card can be made operational again, possibly after
-performing a full electrical #RST of the PCI card. The API below
-provides a generic API for device drivers to be notified of PCI
-errors, and to be notified of, and respond to, a reset sequence.
-
-Preliminary sketch of API, cut-n-pasted-n-modified email from
-Ben Herrenschmidt, circa 5 april 2005
+ February 2, 2006
+
+ Current document maintainer:
+ Linas Vepstas <linas@austin.ibm.com>
+
+
+Many PCI bus controllers are able to detect a variety of hardware
+PCI errors on the bus, such as parity errors on the data and address
+busses, as well as SERR and PERR errors. Some of the more advanced
+chipsets are able to deal with these errors; these include PCI-E chipsets,
+and the PCI-host bridges found on IBM Power4 and Power5-based pSeries
+boxes. A typical action taken is to disconnect the affected device,
+halting all I/O to it. The goal of a disconnection is to avoid system
+corruption; for example, to halt system memory corruption due to DMA's
+to "wild" addresses. Typically, a reconnection mechanism is also
+offered, so that the affected PCI device(s) are reset and put back
+into working condition. The reset phase requires coordination
+between the affected device drivers and the PCI controller chip.
+This document describes a generic API for notifying device drivers
+of a bus disconnection, and then performing error recovery.
+This API is currently implemented in the 2.6.16 and later kernels.
+
+Reporting and recovery is performed in several steps. First, when
+a PCI hardware error has resulted in a bus disconnect, that event
+is reported as soon as possible to all affected device drivers,
+including multiple instances of a device driver on multi-function
+cards. This allows device drivers to avoid deadlocking in spinloops,
+waiting for some i/o-space register to change, when it never will.
+It also gives the drivers a chance to defer incoming I/O as
+needed.
+
+Next, recovery is performed in several stages. Most of the complexity
+is forced by the need to handle multi-function devices, that is,
+devices that have multiple device drivers associated with them.
+In the first stage, each driver is allowed to indicate what type
+of reset it desires, the choices being a simple re-enabling of I/O
+or requesting a hard reset (a full electrical #RST of the PCI card).
+If any driver requests a full reset, that is what will be done.
+
+After a full reset and/or a re-enabling of I/O, all drivers are
+again notified, so that they may then perform any device setup/config
+that may be required. After these have all completed, a final
+"resume normal operations" event is sent out.
+
+The biggest reason for choosing a kernel-based implementation rather
+than a user-space implementation was the need to deal with bus
+disconnects of PCI devices attached to storage media, and, in particular,
+disconnects from devices holding the root file system. If the root
+file system is disconnected, a user-space mechanism would have to go
+through a large number of contortions to complete recovery. Almost all
+of the current Linux file systems are not tolerant of disconnection
+from/reconnection to their underlying block device. By contrast,
+bus errors are easy to manage in the device driver. Indeed, most
+device drivers already handle very similar recovery procedures;
+for example, the SCSI-generic layer already provides significant
+mechanisms for dealing with SCSI bus errors and SCSI bus resets.
+
+
+Detailed Design
+---------------
+Design and implementation details below, based on a chain of
+public email discussions with Ben Herrenschmidt, circa 5 April 2005.