diff options
author | Greg Kroah-Hartman <gregkh@suse.de> | 2005-12-07 16:12:03 -0800 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@suse.de> | 2005-12-07 16:12:03 -0800 |
commit | 1a26058f9d4dc0197733796e0f73d66817dbb1e3 (patch) | |
tree | 1a135585def2314407a6cf9bba54f7fc24ec59fc /pci | |
parent | 4bd956d2e2f9072faecd767e9b27da06b89481f0 (diff) | |
download | patches-1a26058f9d4dc0197733796e0f73d66817dbb1e3.tar.gz |
pci patches added
Diffstat (limited to 'pci')
-rw-r--r-- | pci/pci-error-recovery-documentation.patch | 288 | ||||
-rw-r--r-- | pci/pci-quirk-1k-i-o-space-granularity-on-intel-p64h2.patch | 74 | ||||
-rw-r--r-- | pci/shpchp-implement-get_address-callback.patch | 58 |
3 files changed, 420 insertions, 0 deletions
diff --git a/pci/pci-error-recovery-documentation.patch b/pci/pci-error-recovery-documentation.patch new file mode 100644 index 0000000000000..ac74e9f940f43 --- /dev/null +++ b/pci/pci-error-recovery-documentation.patch @@ -0,0 +1,288 @@ +From linas@austin.ibm.com Fri Dec 2 17:18:03 2005 +Date: Fri, 2 Dec 2005 19:16:18 -0600 +From: <linas@austin.ibm.com> +To: Greg KH <greg@kroah.com> +Subject: PCI Error Recovery: documentation +Message-ID: <20051203011618.GZ31651@austin.ibm.com> +Content-Disposition: inline + +Various PCI bus errors can be signaled by newer PCI controllers. +Recovering from those errors requires an infrastructure to notify +affected device drivers of the error, and a way of walking through +a reset sequence. This patch adds documentation describing the +current error recovery proposal. + +Signed-off-by: Linas Vepstas <linas@austin.ibm.com> +Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> + + +--- + Documentation/pci-error-recovery.txt | 246 +++++++++++++++++++++++++++++++++++ + MAINTAINERS | 7 + 2 files changed, 253 insertions(+) + +--- /dev/null ++++ gregkh-2.6/Documentation/pci-error-recovery.txt +@@ -0,0 +1,246 @@ ++ ++ PCI Error Recovery ++ ------------------ ++ May 31, 2005 ++ ++ Current document maintainer: ++ Linas Vepstas <linas@austin.ibm.com> ++ ++ ++Some PCI bus controllers are able to detect certain "hard" PCI errors ++on the bus, such as parity errors on the data and address busses, as ++well as SERR and PERR errors. These chipsets are then able to disable ++I/O to/from the affected device, so that, for example, a bad DMA ++address doesn't end up corrupting system memory. These same chipsets ++are also able to reset the affected PCI device, and return it to ++working condition. This document describes a generic API form ++performing error recovery. ++ ++The core idea is that after a PCI error has been detected, there must ++be a way for the kernel to coordinate with all affected device drivers ++so that the pci card can be made operational again, possibly after ++performing a full electrical #RST of the PCI card. The API below ++provides a generic API for device drivers to be notified of PCI ++errors, and to be notified of, and respond to, a reset sequence. ++ ++Preliminary sketch of API, cut-n-pasted-n-modified email from ++Ben Herrenschmidt, circa 5 april 2005 ++ ++The error recovery API support is exposed to the driver in the form of ++a structure of function pointers pointed to by a new field in struct ++pci_driver. The absence of this pointer in pci_driver denotes an ++"non-aware" driver, behaviour on these is platform dependant. ++Platforms like ppc64 can try to simulate pci hotplug remove/add. ++ ++The definition of "pci_error_token" is not covered here. It is based on ++Seto's work on the synchronous error detection. We still need to define ++functions for extracting infos out of an opaque error token. This is ++separate from this API. ++ ++This structure has the form: ++ ++struct pci_error_handlers ++{ ++ int (*error_detected)(struct pci_dev *dev, pci_error_token error); ++ int (*mmio_enabled)(struct pci_dev *dev); ++ int (*resume)(struct pci_dev *dev); ++ int (*link_reset)(struct pci_dev *dev); ++ int (*slot_reset)(struct pci_dev *dev); ++}; ++ ++A driver doesn't have to implement all of these callbacks. The ++only mandatory one is error_detected(). If a callback is not ++implemented, the corresponding feature is considered unsupported. ++For example, if mmio_enabled() and resume() aren't there, then the ++driver is assumed as not doing any direct recovery and requires ++a reset. If link_reset() is not implemented, the card is assumed as ++not caring about link resets, in which case, if recover is supported, ++the core can try recover (but not slot_reset() unless it really did ++reset the slot). If slot_reset() is not supported, link_reset() can ++be called instead on a slot reset. ++ ++At first, the call will always be : ++ ++ 1) error_detected() ++ ++ Error detected. This is sent once after an error has been detected. At ++this point, the device might not be accessible anymore depending on the ++platform (the slot will be isolated on ppc64). The driver may already ++have "noticed" the error because of a failing IO, but this is the proper ++"synchronisation point", that is, it gives a chance to the driver to ++cleanup, waiting for pending stuff (timers, whatever, etc...) to ++complete; it can take semaphores, schedule, etc... everything but touch ++the device. Within this function and after it returns, the driver ++shouldn't do any new IOs. Called in task context. This is sort of a ++"quiesce" point. See note about interrupts at the end of this doc. ++ ++ Result codes: ++ - PCIERR_RESULT_CAN_RECOVER: ++ Driever returns this if it thinks it might be able to recover ++ the HW by just banging IOs or if it wants to be given ++ a chance to extract some diagnostic informations (see ++ below). ++ - PCIERR_RESULT_NEED_RESET: ++ Driver returns this if it thinks it can't recover unless the ++ slot is reset. ++ - PCIERR_RESULT_DISCONNECT: ++ Return this if driver thinks it won't recover at all, ++ (this will detach the driver ? or just leave it ++ dangling ? to be decided) ++ ++So at this point, we have called error_detected() for all drivers ++on the segment that had the error. On ppc64, the slot is isolated. What ++happens now typically depends on the result from the drivers. If all ++drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would ++re-enable IOs on the slot (or do nothing special if the platform doesn't ++isolate slots) and call 2). If not and we can reset slots, we go to 4), ++if neither, we have a dead slot. If it's an hotplug slot, we might ++"simulate" reset by triggering HW unplug/replug though. ++ ++>>> Current ppc64 implementation assumes that a device driver will ++>>> *not* schedule or semaphore in this routine; the current ppc64 ++>>> implementation uses one kernel thread to notify all devices; ++>>> thus, of one device sleeps/schedules, all devices are affected. ++>>> Doing better requires complex multi-threaded logic in the error ++>>> recovery implementation (e.g. waiting for all notification threads ++>>> to "join" before proceeding with recovery.) This seems excessively ++>>> complex and not worth implementing. ++ ++>>> The current ppc64 implementation doesn't much care if the device ++>>> attempts i/o at this point, or not. I/O's will fail, returning ++>>> a value of 0xff on read, and writes will be dropped. If the device ++>>> driver attempts more than 10K I/O's to a frozen adapter, it will ++>>> assume that the device driver has gone into an infinite loop, and ++>>> it will panic the the kernel. ++ ++ 2) mmio_enabled() ++ ++ This is the "early recovery" call. IOs are allowed again, but DMA is ++not (hrm... to be discussed, I prefer not), with some restrictions. This ++is NOT a callback for the driver to start operations again, only to ++peek/poke at the device, extract diagnostic information, if any, and ++eventually do things like trigger a device local reset or some such, ++but not restart operations. This is sent if all drivers on a segment ++agree that they can try to recover and no automatic link reset was ++performed by the HW. If the platform can't just re-enable IOs without ++a slot reset or a link reset, it doesn't call this callback and goes ++directly to 3) or 4). All IOs should be done _synchronously_ from ++within this callback, errors triggered by them will be returned via ++the normal pci_check_whatever() api, no new error_detected() callback ++will be issued due to an error happening here. However, such an error ++might cause IOs to be re-blocked for the whole segment, and thus ++invalidate the recovery that other devices on the same segment might ++have done, forcing the whole segment into one of the next states, ++that is link reset or slot reset. ++ ++ Result codes: ++ - PCIERR_RESULT_RECOVERED ++ Driver returns this if it thinks the device is fully ++ functionnal and thinks it is ready to start ++ normal driver operations again. There is no ++ guarantee that the driver will actually be ++ allowed to proceed, as another driver on the ++ same segment might have failed and thus triggered a ++ slot reset on platforms that support it. ++ ++ - PCIERR_RESULT_NEED_RESET ++ Driver returns this if it thinks the device is not ++ recoverable in it's current state and it needs a slot ++ reset to proceed. ++ ++ - PCIERR_RESULT_DISCONNECT ++ Same as above. Total failure, no recovery even after ++ reset driver dead. (To be defined more precisely) ++ ++>>> The current ppc64 implementation does not implement this callback. ++ ++ 3) link_reset() ++ ++ This is called after the link has been reset. This is typically ++a PCI Express specific state at this point and is done whenever a ++non-fatal error has been detected that can be "solved" by resetting ++the link. This call informs the driver of the reset and the driver ++should check if the device appears to be in working condition. ++This function acts a bit like 2) mmio_enabled(), in that the driver ++is not supposed to restart normal driver I/O operations right away. ++Instead, it should just "probe" the device to check it's recoverability ++status. If all is right, then the core will call resume() once all ++drivers have ack'd link_reset(). ++ ++ Result codes: ++ (identical to mmio_enabled) ++ ++>>> The current ppc64 implementation does not implement this callback. ++ ++ 4) slot_reset() ++ ++ This is called after the slot has been soft or hard reset by the ++platform. A soft reset consists of asserting the adapter #RST line ++and then restoring the PCI BARs and PCI configuration header. If the ++platform supports PCI hotplug, then it might instead perform a hard ++reset by toggling power on the slot off/on. This call gives drivers ++the chance to re-initialize the hardware (re-download firmware, etc.), ++but drivers shouldn't restart normal I/O processing operations at ++this point. (See note about interrupts; interrupts aren't guaranteed ++to be delivered until the resume() callback has been called). If all ++device drivers report success on this callback, the patform will call ++resume() to complete the error handling and let the driver restart ++normal I/O processing. ++ ++A driver can still return a critical failure for this function if ++it can't get the device operational after reset. If the platform ++previously tried a soft reset, it migh now try a hard reset (power ++cycle) and then call slot_reset() again. It the device still can't ++be recovered, there is nothing more that can be done; the platform ++will typically report a "permanent failure" in such a case. The ++device will be considered "dead" in this case. ++ ++ Result codes: ++ - PCIERR_RESULT_DISCONNECT ++ Same as above. ++ ++>>> The current ppc64 implementation does not try a power-cycle reset ++>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. ++ ++ 5) resume() ++ ++ This is called if all drivers on the segment have returned ++PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. ++That basically tells the driver to restart activity, tht everything ++is back and running. No result code is taken into account here. If ++a new error happens, it will restart a new error handling process. ++ ++That's it. I think this covers all the possibilities. The way those ++callbacks are called is platform policy. A platform with no slot reset ++capability for example may want to just "ignore" drivers that can't ++recover (disconnect them) and try to let other cards on the same segment ++recover. Keep in mind that in most real life cases, though, there will ++be only one driver per segment. ++ ++Now, there is a note about interrupts. If you get an interrupt and your ++device is dead or has been isolated, there is a problem :) ++ ++After much thinking, I decided to leave that to the platform. That is, ++the recovery API only precies that: ++ ++ - There is no guarantee that interrupt delivery can proceed from any ++device on the segment starting from the error detection and until the ++restart callback is sent, at which point interrupts are expected to be ++fully operational. ++ ++ - There is no guarantee that interrupt delivery is stopped, that is, ad ++river that gets an interrupts after detecting an error, or that detects ++and error within the interrupt handler such that it prevents proper ++ack'ing of the interrupt (and thus removal of the source) should just ++return IRQ_NOTHANDLED. It's up to the platform to deal with taht ++condition, typically by masking the irq source during the duration of ++the error handling. It is expected that the platform "knows" which ++interrupts are routed to error-management capable slots and can deal ++with temporarily disabling that irq number during error processing (this ++isn't terribly complex). That means some IRQ latency for other devices ++sharing the interrupt, but there is simply no other way. High end ++platforms aren't supposed to share interrupts between many devices ++anyway :) ++ ++ ++Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com> +--- gregkh-2.6.orig/MAINTAINERS ++++ gregkh-2.6/MAINTAINERS +@@ -1978,6 +1978,13 @@ M: hch@infradead.org + L: linux-abi-devel@lists.sourceforge.net + S: Maintained + ++PCI ERROR RECOVERY ++P: Linas Vepstas ++M: linas@austin.ibm.com ++L: linux-kernel@vger.kernel.org ++L: linux-pci@atrey.karlin.mff.cuni.cz ++S: Supported ++ + PCI SOUND DRIVERS (ES1370, ES1371 and SONICVIBES) + P: Thomas Sailer + M: sailer@ife.ee.ethz.ch diff --git a/pci/pci-quirk-1k-i-o-space-granularity-on-intel-p64h2.patch b/pci/pci-quirk-1k-i-o-space-granularity-on-intel-p64h2.patch new file mode 100644 index 0000000000000..26adb06598471 --- /dev/null +++ b/pci/pci-quirk-1k-i-o-space-granularity-on-intel-p64h2.patch @@ -0,0 +1,74 @@ +From dan.yeisley@unisys.com Mon Dec 5 05:59:09 2005 +From: Daniel Yeisley <dan.yeisley@unisys.com> +Subject: PCI Quirk: 1K I/O space granularity on Intel P64H2 +To: gregkh@suse.de +Date: Mon, 05 Dec 2005 07:06:43 -0500 +Message-Id: <1133784403.15921.33.camel@localhost.localdomain> + +I've implemented a quirk to take advantage of the 1KB I/O space +granularity option on the Intel P64H2 PCI Bridge. I had to change +probe.c because it sets the resource start and end to be aligned on 4k +boundaries (after the quirk sets them to 1k boundaries). I've tested +this patch on a Unisys ES7000-600 both with and without the 1KB option +enabled. I also tested this on a 2 processor Dell box that doesn't have +a P64H2 to make sure there were no negative affects there. + +Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com> +Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> +--- + drivers/pci/probe.c | 6 ++++-- + drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++ + 2 files changed, 32 insertions(+), 2 deletions(-) + +--- gregkh-2.6.orig/drivers/pci/probe.c ++++ gregkh-2.6/drivers/pci/probe.c +@@ -264,8 +264,10 @@ void __devinit pci_read_bridge_bases(str + + if (base <= limit) { + res->flags = (io_base_lo & PCI_IO_RANGE_TYPE_MASK) | IORESOURCE_IO; +- res->start = base; +- res->end = limit + 0xfff; ++ if (!res->start) ++ res->start = base; ++ if (!res->end) ++ res->end = limit + 0xfff; + } + + res = child->resource[1]; +--- gregkh-2.6.orig/drivers/pci/quirks.c ++++ gregkh-2.6/drivers/pci/quirks.c +@@ -1312,6 +1312,34 @@ void pci_fixup_device(enum pci_fixup_pas + pci_do_fixups(dev, start, end); + } + ++/* ++ * Enable 1k I/O space granularity on the Intel P64H2 ++ */ ++static void __devinit quirk_p64h2_1k_io(struct pci_dev *dev) ++{ ++ u16 en1k; ++ u8 io_base_lo, io_limit_lo; ++ unsigned long base, limit; ++ struct resource *res = dev->resource + PCI_BRIDGE_RESOURCES; ++ ++ pci_read_config_word(dev, 0x40, &en1k); ++ ++ if (en1k & 0x200) { ++ printk(KERN_INFO "PCI: Enable I/O Space to 1 KB Granularity\n"); ++ ++ pci_read_config_byte(dev, PCI_IO_BASE, &io_base_lo); ++ pci_read_config_byte(dev, PCI_IO_LIMIT, &io_limit_lo); ++ base = (io_base_lo & (PCI_IO_RANGE_MASK | 0x0c)) << 8; ++ limit = (io_limit_lo & (PCI_IO_RANGE_MASK | 0x0c)) << 8; ++ ++ if (base <= limit) { ++ res->start = base; ++ res->end = limit + 0x3ff; ++ } ++ } ++} ++DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1460, quirk_p64h2_1k_io); ++ + EXPORT_SYMBOL_GPL(pcie_mch_quirk); + #ifdef CONFIG_HOTPLUG + EXPORT_SYMBOL_GPL(pci_fixup_device); diff --git a/pci/shpchp-implement-get_address-callback.patch b/pci/shpchp-implement-get_address-callback.patch new file mode 100644 index 0000000000000..c26fa5b4f21bd --- /dev/null +++ b/pci/shpchp-implement-get_address-callback.patch @@ -0,0 +1,58 @@ +From kaneshige.kenji@jp.fujitsu.com Mon Dec 5 02:34:13 2005 +Message-ID: <439416E4.90800@jp.fujitsu.com> +Date: Mon, 05 Dec 2005 19:31:00 +0900 +From: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> +To: <greg@kroah.com>, <kristen.c.accardi@intel.com> +Subject: shpchp: Implement get_address callback + +The following patch implements .get_address callback of +hotplug_slot_ops for SHPCHP driver. With this patch, we +can see bus address of hotplug slots as follows: + + $ cat address + 0000:0b:01 + +Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> +Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> + +--- + drivers/pci/hotplug/shpchp_core.c | 14 ++++++++++++++ + 1 file changed, 14 insertions(+) + +--- gregkh-2.6.orig/drivers/pci/hotplug/shpchp_core.c ++++ gregkh-2.6/drivers/pci/hotplug/shpchp_core.c +@@ -65,6 +65,7 @@ static int get_power_status (struct hotp + static int get_attention_status (struct hotplug_slot *slot, u8 *value); + static int get_latch_status (struct hotplug_slot *slot, u8 *value); + static int get_adapter_status (struct hotplug_slot *slot, u8 *value); ++static int get_address (struct hotplug_slot *slot, u32 *value); + static int get_max_bus_speed (struct hotplug_slot *slot, enum pci_bus_speed *value); + static int get_cur_bus_speed (struct hotplug_slot *slot, enum pci_bus_speed *value); + +@@ -77,6 +78,7 @@ static struct hotplug_slot_ops shpchp_ho + .get_attention_status = get_attention_status, + .get_latch_status = get_latch_status, + .get_adapter_status = get_adapter_status, ++ .get_address = get_address, + .get_max_bus_speed = get_max_bus_speed, + .get_cur_bus_speed = get_cur_bus_speed, + }; +@@ -314,6 +316,18 @@ static int get_adapter_status (struct ho + return 0; + } + ++static int get_address (struct hotplug_slot *hotplug_slot, u32 *value) ++{ ++ struct slot *slot = get_slot (hotplug_slot, __FUNCTION__); ++ struct pci_bus *bus = slot->ctrl->pci_dev->subordinate; ++ ++ dbg("%s - physical_slot = %s\n", __FUNCTION__, hotplug_slot->name); ++ ++ *value = (pci_domain_nr(bus) << 16) | (slot->bus << 8) | slot->device; ++ ++ return 0; ++} ++ + static int get_max_bus_speed (struct hotplug_slot *hotplug_slot, enum pci_bus_speed *value) + { + struct slot *slot = get_slot (hotplug_slot, __FUNCTION__); |