ChangeSet 1.995, 2003/02/20 11:11:24-08:00, david-b@pacbell.net [PATCH] USB: ehci-hcd, more hangs gone The key update in this patch is an important qh state machine fix. In my testing, that removes hangs that I could reproduce on VIA and Philips (much friendlier failures), without resort to sadism. I suspect VT6202 users will still complain. (I just saw one test deadlock in usb-storage; and at least one bug in the EHCI code enjoys hide-and-seek too much.) And the VT8235 and "hdparm -tT" numbers are still about half what they should be (on 2.4). diff -Nru a/Documentation/usb/ehci.txt b/Documentation/usb/ehci.txt --- a/Documentation/usb/ehci.txt Thu Feb 20 12:07:09 2003 +++ b/Documentation/usb/ehci.txt Thu Feb 20 12:07:09 2003 @@ -1,4 +1,4 @@ -26-Apr-2002 +27-Dec-2002 The EHCI driver is used to talk to high speed USB 2.0 devices using USB 2.0-capable host controller hardware. The USB 2.0 standard is @@ -21,11 +21,15 @@ At this writing, this driver has been seen to work with implementations of EHCI from (in alphabetical order): Intel, NEC, Philips, and VIA. +Other EHCI implementations are becoming available from other vendors; +you should expect this driver to work with them too. -At this writing, high speed devices are finally beginning to appear. -While usb-storage devices have been available for some time (working +While usb-storage devices have been available since mid-2001 (working quite speedily on the 2.4 version of this driver), hubs have only -very recently become available. +been available since late 2001, and other kinds of high speed devices +appear to be on hold until more systems come with USB 2.0 built-in. +Such new systems have been available since early 2002, and became much +more typical in the second half of 2002. Note that USB 2.0 support involves more than just EHCI. It requires other changes to the Linux-USB core APIs, including the hub driver, @@ -43,26 +47,29 @@ It's believed to do all the right PCI magic so that I/O works even on systems with interesting DMA mapping issues. -At this writing the driver should comfortably handle all control and bulk -transfers, including requests to USB 1.1 devices through transaction -translators (TTs) in USB 2.0 hubs. However, there some situations where -the hub driver needs to clear TT error state, which it doesn't yet do. - -Interrupt transfer support is newly functional and not yet as robust as -control and bulk traffic. As yet there is no support for split transaction -scheduling for interrupt transfers, which means among other things that -connecting USB 1.1 hubs, keyboards, and mice to USB 2.0 hubs won't work. -Connect them to USB 1.1 hubs, or to a root hub. - -Isochronous (ISO) transfer support is also newly functional. No production -high speed devices are available which would need it (though high quality -webcams are in the works!). Note that split transaction support for ISO +Transfer Types + +At this writing the driver should comfortably handle all control, bulk, +and interrupt transfers, including requests to USB 1.1 devices through +transaction translators (TTs) in USB 2.0 hubs. But you may find bugs. + +High Speed Isochronous (ISO) transfer support is also functional, but +at this writing no Linux drivers have been using that support. + +Full Speed Isochronous transfer support, through transaction translators, +is not yet available. Note that split transaction support for ISO transfers can't share much code with the code for high speed ISO transfers, since EHCI represents these with a different data structure. So for now, -most USB audio and video devices have the same restrictions as hubs, mice, -and keyboards: don't connect them using high speed USB hubs. +most USB audio and video devices can't be connected to high speed buses. + +Driver Behavior -The EHCI root hub code should hand off USB 1.1 devices to its companion +Transfers of all types can be queued. This means that control transfers +from a driver on one interface (or through usbfs) won't interfere with +ones from another driver, and that interrupt transfers can use periods +of one frame without risking data loss due to interrupt processing costs. + +The EHCI root hub code hands off USB 1.1 devices to its companion controller. This driver doesn't need to know anything about those drivers; a OHCI or UHCI driver that works already doesn't need to change just because the EHCI driver is also present. @@ -70,6 +77,11 @@ There are some issues with power management; suspend/resume doesn't behave quite right at the moment. +Also, some shortcuts have been taken with the scheduling periodic +transactions (interrupt and isochronous transfers). These place some +limits on the number of periodic transactions that can be scheduled, +and prevent use of polling intervals of less than one frame. + USE BY @@ -83,10 +95,10 @@ # rmmod ehci-hcd You should also have a driver for a "companion controller", such as -"ohci-hcd", "usb-ohci", "usb-uhci", or "uhci". In case of any trouble -with the EHCI driver, remove its module and then the driver for that -companion controller will take over (at lower speed) all the devices -that were previously handled by the EHCI driver. +"ohci-hcd" or "uhci-hcd". In case of any trouble with the EHCI driver, +remove its module and then the driver for that companion controller will +take over (at lower speed) all the devices that were previously handled +by the EHCI driver. Module parameters (pass to "modprobe") include: @@ -96,9 +108,20 @@ is 6, indicating 2^6 = 64 microframes. This controls how often the EHCI controller can issue interrupts. -The EHCI interrupt handler just acknowledges interrupts and schedules -a tasklet to handle whatever needs handling. That keeps latencies low, -no matter how often interrupts are issued. +If you're using this driver on a 2.5 kernel, and you've enabled USB +debugging support, you'll see three files in the "sysfs" directory for +any EHCI controller: + + "async" dumps the asynchronous schedule, used for control + and bulk transfers. Shows each active qh and the qtds + pending, usually one qtd per urb. (Look at it with + usb-storage doing disk I/O; watch the request queues!) + "periodic" dumps the periodic schedule, used for interrupt + and isochronous transfers. Doesn't show qtds. + "registers" show controller register state, and + +The contents of those files can help identify driver problems. + Device drivers shouldn't care whether they're running over EHCI or not, but they may want to check for "usb_device->speed == USB_SPEED_HIGH". @@ -107,6 +130,11 @@ Also, some values in device descriptors (such as polling intervals for periodic transfers) use different encodings when operating at high speed. +However, do make a point of testing device drivers through USB 2.0 hubs. +Those hubs report some failures, such as disconnections, differently when +transaction translators are in use; some drivers have been seen to behave +badly when they see different faults than OHCI or UHCI report. + PERFORMANCE @@ -122,13 +150,18 @@ and at most 13 of those fit into one USB 2.0 microframe. Eight USB 2.0 microframes fit in a USB 1.1 frame; a microframe is 1 msec/8 = 125 usec. +So more than 50 MByte/sec is available for bulk transfers, when both +hardware and device driver software allow it. Periodic transfer modes +(isochronous and interrupt) allow the larger packet sizes which let you +approach the quoted 480 MBit/sec transfer rate. + Hardware Performance At this writing, individual USB 2.0 devices tend to max out at around 20 MByte/sec transfer rates. This is of course subject to change; and some devices now go faster, while others go slower. -The NEC implementation of EHCI seems to have a hardware bottleneck +The first NEC implementation of EHCI seems to have a hardware bottleneck at around 28 MByte/sec aggregate transfer rate. While this is clearly enough for a single device at 20 MByte/sec, putting three such devices onto one bus does not get you 60 MByte/sec. The issue appears to be @@ -136,9 +169,11 @@ so that it's only trying six (or maybe seven) USB transactions each microframe rather than thirteen. (Seems like a reasonable trade off for a product that beat all the others to market by over a year!) + It's expected that newer implementations will better this, throwing more silicon real estate at the problem so that new motherboard chip -sets will get closer to that 60 MByte/sec target. +sets will get closer to that 60 MByte/sec target. That includes an +updated implementation from NEC, as well as other vendors' silicon. There's a minimum latency of one microframe (125 usec) for the host to receive interrupts from the EHCI controller indicating completion @@ -161,9 +196,15 @@ sequence of 128 KB chunks would waste a lot less. But rather than depending on such large I/O buffers to make synchronous -I/O be efficient, it's better to just queue all several (bulk) requests +I/O be efficient, it's better to just queue up several (bulk) requests to the HC, and wait for them all to complete (or be canceled on error). Such URB queuing should work with all the USB 1.1 HC drivers too. + +In the Linux 2.5 kernels, new usb_sg_*() api calls have been defined; they +queue all the buffers from a scatterlist. They also use scatterlist DMA +mapping (which might apply an IOMMU) and IRQ reduction, all of which will +help make high speed transfers run as fast as they can. + TBD: Interrupt and ISO transfer performance issues. Those periodic transfers are fully scheduled, so the main issue is likely to be how diff -Nru a/drivers/usb/hcd/ehci-dbg.c b/drivers/usb/hcd/ehci-dbg.c --- a/drivers/usb/hcd/ehci-dbg.c Thu Feb 20 12:07:09 2003 +++ b/drivers/usb/hcd/ehci-dbg.c Thu Feb 20 12:07:09 2003 @@ -287,7 +287,26 @@ default: tmp = '?'; break; \ }; tmp; }) -static void qh_lines (struct ehci_qh *qh, char **nextp, unsigned *sizep) +static inline char token_mark (u32 token) +{ + token = le32_to_cpu (token); + if (token & QTD_STS_ACTIVE) + return '*'; + if (token & QTD_STS_HALT) + return '-'; + if (QTD_PID (token) != 1 /* not IN: OUT or SETUP */ + || QTD_LENGTH (token) == 0) + return ' '; + /* tries to advance through hw_alt_next */ + return '/'; +} + +static void qh_lines ( + struct ehci_hcd *ehci, + struct ehci_qh *qh, + char **nextp, + unsigned *sizep +) { u32 scratch; u32 hw_curr; @@ -296,26 +315,49 @@ unsigned temp; unsigned size = *sizep; char *next = *nextp; + char mark; + mark = token_mark (qh->hw_token); + if (mark == '/') { /* qh_alt_next controls qh advance? */ + if ((qh->hw_alt_next & QTD_MASK) == ehci->async->hw_alt_next) + mark = '#'; /* blocked */ + else if (qh->hw_alt_next & cpu_to_le32 (0x01)) + mark = '.'; /* use hw_qtd_next */ + /* else alt_next points to some other qtd */ + } scratch = cpu_to_le32p (&qh->hw_info1); - hw_curr = cpu_to_le32p (&qh->hw_current); + hw_curr = (mark == '*') ? cpu_to_le32p (&qh->hw_current) : 0; temp = snprintf (next, size, - "qh/%p dev%d %cs ep%d %08x %08x (%08x %08x)", + "qh/%p dev%d %cs ep%d %08x %08x (%08x%c %s nak%d)", qh, scratch & 0x007f, speed_char (scratch), (scratch >> 8) & 0x000f, scratch, cpu_to_le32p (&qh->hw_info2), - hw_curr, cpu_to_le32p (&qh->hw_token)); + cpu_to_le32p (&qh->hw_token), mark, + (cpu_to_le32 (0x8000000) & qh->hw_token) + ? "data0" : "data1", + (cpu_to_le32p (&qh->hw_alt_next) >> 1) & 0x0f); size -= temp; next += temp; + /* hc may be modifying the list as we read it ... */ list_for_each (entry, &qh->qtd_list) { td = list_entry (entry, struct ehci_qtd, qtd_list); scratch = cpu_to_le32p (&td->hw_token); + mark = ' '; + if (hw_curr == td->qtd_dma) + mark = '*'; + else if (qh->hw_qtd_next == td->qtd_dma) + mark = '+'; + else if (QTD_LENGTH (scratch)) { + if (td->hw_alt_next == ehci->async->hw_alt_next) + mark = '#'; + else if (td->hw_alt_next != EHCI_LIST_END) + mark = '/'; + } temp = snprintf (next, size, - "\n\t%std/%p %s len=%d %08x urb %p", - (hw_curr == td->qtd_dma) ? "*" : "", - td, ({ char *tmp; + "\n\t%p%c%s len=%d %08x urb %p", + td, mark, ({ char *tmp; switch ((scratch>>8)&0x03) { case 0: tmp = "out"; break; case 1: tmp = "in"; break; @@ -325,17 +367,31 @@ (scratch >> 16) & 0x7fff, scratch, td->urb); + if (temp < 0) + temp = 0; + else if (size < temp) + temp = size; size -= temp; next += temp; + if (temp == size) + goto done; } temp = snprintf (next, size, "\n"); - *sizep = size - temp; - *nextp = next + temp; + if (temp < 0) + temp = 0; + else if (size < temp) + temp = size; + size -= temp; + next += temp; + +done: + *sizep = size; + *nextp = next; } static ssize_t -show_async (struct device *dev, char *buf, size_t count, loff_t off) +show_async (struct device *dev, char *buf) { struct pci_dev *pdev; struct ehci_hcd *ehci; @@ -344,38 +400,36 @@ char *next; struct ehci_qh *qh; - if (off != 0) - return 0; - pdev = container_of (dev, struct pci_dev, dev); ehci = container_of (pci_get_drvdata (pdev), struct ehci_hcd, hcd); next = buf; - size = count; + size = PAGE_SIZE; /* dumps a snapshot of the async schedule. * usually empty except for long-term bulk reads, or head. * one QH per line, and TDs we know about */ spin_lock_irqsave (&ehci->lock, flags); - for (qh = ehci->async->qh_next.qh; qh; qh = qh->qh_next.qh) - qh_lines (qh, &next, &size); - if (ehci->reclaim) { + for (qh = ehci->async->qh_next.qh; size > 0 && qh; qh = qh->qh_next.qh) + qh_lines (ehci, qh, &next, &size); + if (ehci->reclaim && size > 0) { temp = snprintf (next, size, "\nreclaim =\n"); size -= temp; next += temp; - qh_lines (ehci->reclaim, &next, &size); + for (qh = ehci->reclaim; size > 0 && qh; qh = qh->reclaim) + qh_lines (ehci, qh, &next, &size); } spin_unlock_irqrestore (&ehci->lock, flags); - return count - size; + return PAGE_SIZE - size; } static DEVICE_ATTR (async, S_IRUGO, show_async, NULL); #define DBG_SCHED_LIMIT 64 static ssize_t -show_periodic (struct device *dev, char *buf, size_t count, loff_t off) +show_periodic (struct device *dev, char *buf) { struct pci_dev *pdev; struct ehci_hcd *ehci; @@ -385,8 +439,6 @@ char *next; unsigned i, tag; - if (off != 0) - return 0; if (!(seen = kmalloc (DBG_SCHED_LIMIT * sizeof *seen, SLAB_ATOMIC))) return 0; seen_count = 0; @@ -394,7 +446,7 @@ pdev = container_of (dev, struct pci_dev, dev); ehci = container_of (pci_get_drvdata (pdev), struct ehci_hcd, hcd); next = buf; - size = count; + size = PAGE_SIZE; temp = snprintf (next, size, "size = %d\n", ehci->periodic_size); size -= temp; @@ -436,7 +488,7 @@ scratch & 0x007f, (scratch >> 8) & 0x000f, p.qh->usecs, p.qh->c_usecs, - scratch >> 16); + 0x7ff & (scratch >> 16)); /* FIXME TD info too */ @@ -478,14 +530,14 @@ spin_unlock_irqrestore (&ehci->lock, flags); kfree (seen); - return count - size; + return PAGE_SIZE - size; } static DEVICE_ATTR (periodic, S_IRUGO, show_periodic, NULL); #undef DBG_SCHED_LIMIT static ssize_t -show_registers (struct device *dev, char *buf, size_t count, loff_t off) +show_registers (struct device *dev, char *buf) { struct pci_dev *pdev; struct ehci_hcd *ehci; @@ -495,20 +547,18 @@ static char fmt [] = "%*s\n"; static char label [] = ""; - if (off != 0) - return 0; - pdev = container_of (dev, struct pci_dev, dev); ehci = container_of (pci_get_drvdata (pdev), struct ehci_hcd, hcd); next = buf; - size = count; + size = PAGE_SIZE; spin_lock_irqsave (&ehci->lock, flags); /* Capability Registers */ i = readw (&ehci->caps->hci_version); - temp = snprintf (next, size, "EHCI %x.%02x, hcd state %d\n", + temp = snprintf (next, size, + "EHCI %x.%02x, hcd state %d (version " DRIVER_VERSION ")\n", i >> 8, i & 0x0ff, ehci->hcd.state); size -= temp; next += temp; @@ -578,7 +628,7 @@ spin_unlock_irqrestore (&ehci->lock, flags); - return count - size; + return PAGE_SIZE - size; } static DEVICE_ATTR (registers, S_IRUGO, show_registers, NULL); diff -Nru a/drivers/usb/hcd/ehci-hcd.c b/drivers/usb/hcd/ehci-hcd.c --- a/drivers/usb/hcd/ehci-hcd.c Thu Feb 20 12:07:09 2003 +++ b/drivers/usb/hcd/ehci-hcd.c Thu Feb 20 12:07:09 2003 @@ -93,7 +93,7 @@ * 2001-June Works with usb-storage and NEC EHCI on 2.4 */ -#define DRIVER_VERSION "2002-Dec-20" +#define DRIVER_VERSION "2003-Jan-22" #define DRIVER_AUTHOR "David Brownell" #define DRIVER_DESC "USB 2.0 'Enhanced' Host Controller (EHCI) Driver" @@ -107,14 +107,15 @@ #define EHCI_STATS #endif -#define INTR_AUTOMAGIC /* to be removed later in 2.5 */ +#define INTR_AUTOMAGIC /* urb lifecycle mode, gone in 2.5 */ /* magic numbers that can affect system performance */ #define EHCI_TUNE_CERR 3 /* 0-3 qtd retries; 0 == don't stop */ -#define EHCI_TUNE_RL_HS 0 /* nak throttle; see 4.9 */ +#define EHCI_TUNE_RL_HS 4 /* nak throttle; see 4.9 */ #define EHCI_TUNE_RL_TT 0 #define EHCI_TUNE_MULT_HS 1 /* 1-3 transactions/uframe; 4.10.3 */ #define EHCI_TUNE_MULT_TT 1 +#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */ #define EHCI_WATCHDOG_JIFFIES (HZ/100) /* arbitrary; ~10 msec */ #define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */ @@ -406,9 +407,10 @@ * streaming mappings for I/O buffers, like pci_map_single(), * can return segments above 4GB, if the device allows. * - * NOTE: layered drivers can't yet tell when we enable that, - * so they can't pass this info along (like NETIF_F_HIGHDMA) - * (or like Scsi_Host.highmem_io) ... usb_bus.flags? + * NOTE: the dma mask is visible through dma_supported(), so + * drivers can pass this info along ... like NETIF_F_HIGHDMA, + * Scsi_Host.highmem_io, and so forth. It's readonly to all + * host side drivers though. */ if (HCC_64BIT_ADDR (hcc_params)) { writel (0, &ehci->regs->segment); @@ -416,13 +418,26 @@ ehci_info (ehci, "enabled 64bit PCI DMA\n"); } + /* help hc dma work well with cachelines */ + pci_set_mwi (ehci->hcd.pdev); + /* clear interrupt enables, set irq latency */ temp = readl (&ehci->regs->command) & 0xff; if (log2_irq_thresh < 0 || log2_irq_thresh > 6) log2_irq_thresh = 0; temp |= 1 << (16 + log2_irq_thresh); // if hc can park (ehci >= 0.96), default is 3 packets per async QH - // keeping default periodic framelist size + if (HCC_PGM_FRAMELISTLEN (hcc_params)) { + /* periodic schedule size can be smaller than default */ + temp &= ~(3 << 2); + temp |= (EHCI_TUNE_FLS << 2); + switch (EHCI_TUNE_FLS) { + case 0: ehci->periodic_size = 1024; break; + case 1: ehci->periodic_size = 512; break; + case 2: ehci->periodic_size = 256; break; + default: BUG (); + } + } temp &= ~(CMD_IAAD | CMD_ASE | CMD_PSE), // Philips, Intel, and maybe others need CMD_RUN before the // root hub will detect new devices (why?); NEC doesn't @@ -476,7 +491,6 @@ ehci_ready (ehci); ehci_reset (ehci); bus->root_hub = 0; - usb_free_dev (udev); retval = -ENODEV; goto done2; } @@ -637,9 +651,13 @@ static void ehci_irq (struct usb_hcd *hcd, struct pt_regs *regs) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); - u32 status = readl (&ehci->regs->status); + u32 status; int bh; + spin_lock (&ehci->lock); + + status = readl (&ehci->regs->status); + /* e.g. cardbus physical eject */ if (status == ~(u32) 0) { ehci_dbg (ehci, "device removed\n"); @@ -648,9 +666,7 @@ status &= INTR_MASK; if (!status) /* irq sharing? */ - return; - - spin_lock (&ehci->lock); + goto done; /* clear (just) interrupts */ writel (status, &ehci->regs->status); @@ -693,6 +709,7 @@ if (bh) ehci_work (ehci, regs); +done: spin_unlock (&ehci->lock); } @@ -756,7 +773,6 @@ struct ehci_hcd *ehci = hcd_to_ehci (hcd); struct ehci_qh *qh; unsigned long flags; - int maybe_irq = 1; spin_lock_irqsave (&ehci->lock, flags); switch (usb_pipetype (urb->pipe)) { @@ -766,23 +782,23 @@ qh = (struct ehci_qh *) urb->hcpriv; if (!qh) break; - while (qh->qh_state == QH_STATE_LINKED + + /* if we need to use IAA and it's busy, defer */ + if (qh->qh_state == QH_STATE_LINKED && ehci->reclaim && HCD_IS_RUNNING (ehci->hcd.state) ) { - spin_unlock_irqrestore (&ehci->lock, flags); + struct ehci_qh *last; - if (maybe_irq) { - if (in_interrupt ()) - return -EAGAIN; - maybe_irq = 0; - } - /* let pending unlinks complete, so this can start */ - wait_ms (1); + for (last = ehci->reclaim; + last->reclaim; + last = last->reclaim) + continue; + qh->qh_state = QH_STATE_UNLINK_WAIT; + last->reclaim = qh; - spin_lock_irqsave (&ehci->lock, flags); - } - if (!HCD_IS_RUNNING (ehci->hcd.state) && ehci->reclaim) + /* bypass IAA if the hc can't care */ + } else if (!HCD_IS_RUNNING (ehci->hcd.state) && ehci->reclaim) end_unlink_async (ehci, NULL); /* something else might have unlinked the qh by now */ diff -Nru a/drivers/usb/hcd/ehci-mem.c b/drivers/usb/hcd/ehci-mem.c --- a/drivers/usb/hcd/ehci-mem.c Thu Feb 20 12:07:09 2003 +++ b/drivers/usb/hcd/ehci-mem.c Thu Feb 20 12:07:09 2003 @@ -75,8 +75,6 @@ qtd = pci_pool_alloc (ehci->qtd_pool, flags, &dma); if (qtd != 0) { ehci_qtd_init (qtd, dma); - if (ehci->async) - qtd->hw_alt_next = ehci->async->hw_alt_next; } return qtd; } diff -Nru a/drivers/usb/hcd/ehci-q.c b/drivers/usb/hcd/ehci-q.c --- a/drivers/usb/hcd/ehci-q.c Thu Feb 20 12:07:09 2003 +++ b/drivers/usb/hcd/ehci-q.c Thu Feb 20 12:07:09 2003 @@ -43,7 +43,8 @@ /* fill a qtd, returning how much of the buffer we were able to queue up */ static int -qtd_fill (struct ehci_qtd *qtd, dma_addr_t buf, size_t len, int token) +qtd_fill (struct ehci_qtd *qtd, dma_addr_t buf, size_t len, + int token, int maxpacket) { int i, count; u64 addr = buf; @@ -69,6 +70,10 @@ else count = len; } + + /* short packets may only terminate transfers */ + if (count != len) + count -= (count % maxpacket); } qtd->hw_token = cpu_to_le32 ((count << 16) | token); qtd->length = count; @@ -85,7 +90,7 @@ { qh->hw_current = 0; qh->hw_qtd_next = QTD_NEXT (qtd->qtd_dma); - qh->hw_alt_next = ehci->async->hw_alt_next; + qh->hw_alt_next = EHCI_LIST_END; /* HC must see latest qtd and qh data before we clear ACTIVE+HALT */ wmb (); @@ -96,7 +101,7 @@ #define IS_SHORT_READ(token) (QTD_LENGTH (token) != 0 && QTD_PID (token) == 1) -static inline void qtd_copy_status ( +static void qtd_copy_status ( struct ehci_hcd *ehci, struct urb *urb, size_t length, @@ -256,14 +261,26 @@ static unsigned qh_completions (struct ehci_hcd *ehci, struct ehci_qh *qh, struct pt_regs *regs) { - struct ehci_qtd *last = 0; + struct ehci_qtd *last = 0, *end = qh->dummy; struct list_head *entry, *tmp; - int stopped = 0; + int stopped; unsigned count = 0; + int do_status = 0; + u8 state; if (unlikely (list_empty (&qh->qtd_list))) return count; + /* completions (or tasks on other cpus) must never clobber HALT + * till we've gone through and cleaned everything up, even when + * they add urbs to this qh's queue or mark them for unlinking. + * + * NOTE: unlinking expects to be done in queue order. + */ + state = qh->qh_state; + qh->qh_state = QH_STATE_COMPLETING; + stopped = (state == QH_STATE_IDLE); + /* remove de-activated QTDs from front of queue. * after faults (including short reads), cleanup this urb * then let the queue advance. @@ -287,11 +304,14 @@ last = 0; } + /* ignore urbs submitted during completions we reported */ + if (qtd == end) + break; + /* hardware copies qtd out of qh overlay */ rmb (); token = le32_to_cpu (qtd->hw_token); stopped = stopped - || (qh->qh_state == QH_STATE_IDLE) || (HALT_BIT & qh->hw_token) != 0 || (ehci->hcd.state == USB_STATE_HALT); @@ -301,36 +321,53 @@ /* magic dummy for short reads; won't advance */ if (IS_SHORT_READ (token) && !(token & QTD_STS_HALT) - && ehci->async->hw_alt_next - == qh->hw_alt_next) + && (qh->hw_alt_next & QTD_MASK) + == ehci->async->hw_alt_next) { + stopped = 1; goto halt; + } /* stop scanning when we reach qtds the hc is using */ } else if (likely (!stopped)) { - last = 0; break; } else { - /* ignore active qtds unless some previous qtd + /* ignore active urbs unless some previous qtd * for the urb faulted (including short read) or * its urb was canceled. we may patch qh or qtds. */ - if ((token & QTD_STS_ACTIVE) - && urb->status == -EINPROGRESS) { - last = 0; + if (likely (urb->status == -EINPROGRESS)) + continue; + + /* issue status after short control reads */ + if (unlikely (do_status != 0) + && QTD_PID (token) == 0 /* OUT */) { + do_status = 0; continue; } + + /* token in overlay may be most current */ + if (state == QH_STATE_IDLE + && cpu_to_le32 (qtd->qtd_dma) + == qh->hw_current) + token = le32_to_cpu (qh->hw_token); + + /* force halt for unlinked or blocked qh, so we'll + * patch the qh later and so that completions can't + * activate it while we "know" it's stopped. + */ if ((HALT_BIT & qh->hw_token) == 0) { halt: qh->hw_token |= HALT_BIT; wmb (); - stopped = 1; } } /* remove it from the queue */ spin_lock (&urb->lock); qtd_copy_status (ehci, urb, qtd->length, token); + do_status = (urb->status == -EREMOTEIO) + && usb_pipecontrol (urb->pipe); spin_unlock (&urb->lock); if (stopped && qtd->qtd_list.prev != &qh->qtd_list) { @@ -349,6 +386,9 @@ ehci_qtd_free (ehci, last); } + /* restore original state; caller must unlink or relink */ + qh->qh_state = state; + /* update qh after fault cleanup */ if (unlikely ((HALT_BIT & qh->hw_token) != 0)) { qh_update (ehci, qh, @@ -397,7 +437,7 @@ struct ehci_qtd *qtd, *qtd_prev; dma_addr_t buf; int len, maxpacket; - int is_input, status_patch = 0; + int is_input; u32 token; /* @@ -418,7 +458,7 @@ if (usb_pipecontrol (urb->pipe)) { /* SETUP pid */ qtd_fill (qtd, urb->setup_dma, sizeof (struct usb_ctrlrequest), - token | (2 /* "setup" */ << 8)); + token | (2 /* "setup" */ << 8), 8); /* ... and always at least one more pid */ token ^= QTD_TOGGLE; @@ -429,10 +469,6 @@ qtd->urb = urb; qtd_prev->hw_next = QTD_NEXT (qtd->qtd_dma); list_add_tail (&qtd->qtd_list, head); - - if (len > 0 && is_input - && !(urb->transfer_flags & URB_SHORT_NOT_OK)) - status_patch = 1; } /* @@ -443,6 +479,7 @@ else buf = 0; + // FIXME this 'buf' check break some zlps... if (!buf || is_input) token |= (1 /* "in" */ << 8); /* else it's already initted to "out" pid (0 << 8) */ @@ -457,9 +494,11 @@ for (;;) { int this_qtd_len; - this_qtd_len = qtd_fill (qtd, buf, len, token); + this_qtd_len = qtd_fill (qtd, buf, len, token, maxpacket); len -= this_qtd_len; buf += this_qtd_len; + if (is_input) + qtd->hw_alt_next = ehci->async->hw_alt_next; /* qh makes control packets use qtd toggle; maybe switch it */ if ((maxpacket & (this_qtd_len + (maxpacket - 1))) == 0) @@ -477,6 +516,13 @@ list_add_tail (&qtd->qtd_list, head); } + /* unless the bulk/interrupt caller wants a chance to clean + * up after short reads, hc should advance qh past this urb + */ + if (likely ((urb->transfer_flags & URB_SHORT_NOT_OK) == 0 + || usb_pipecontrol (urb->pipe))) + qtd->hw_alt_next = EHCI_LIST_END; + /* * control requests may need a terminating data "status" ack; * bulk ones may need a terminating short packet (zero length). @@ -503,23 +549,10 @@ list_add_tail (&qtd->qtd_list, head); /* never any data in such packets */ - qtd_fill (qtd, 0, 0, token); + qtd_fill (qtd, 0, 0, token, 0); } } - /* if we're permitting a short control read, we want the hardware to - * just continue after short data and send the status ack. it can do - * that on the last data packet (typically the only one). for other - * packets, software fixup is needed (in qh_completions). - */ - if (status_patch) { - struct ehci_qtd *prev; - - prev = list_entry (qtd->qtd_list.prev, - struct ehci_qtd, qtd_list); - prev->hw_alt_next = QTD_NEXT (qtd->qtd_dma); - } - /* by default, enable interrupt on urb completion */ if (likely (!(urb->transfer_flags & URB_NO_INTERRUPT))) qtd->hw_token |= __constant_cpu_to_le32 (QTD_IOC); @@ -641,7 +674,8 @@ case USB_SPEED_FULL: /* EPS 0 means "full" */ - info1 |= (EHCI_TUNE_RL_TT << 28); + if (type != PIPE_INTERRUPT) + info1 |= (EHCI_TUNE_RL_TT << 28); if (type == PIPE_CONTROL) { info1 |= (1 << 27); /* for TT */ info1 |= 1 << 14; /* toggle from qtd */ @@ -658,12 +692,13 @@ case USB_SPEED_HIGH: /* no TT involved */ info1 |= (2 << 12); /* EPS "high" */ - info1 |= (EHCI_TUNE_RL_HS << 28); if (type == PIPE_CONTROL) { + info1 |= (EHCI_TUNE_RL_HS << 28); info1 |= 64 << 16; /* usb2 fixed maxpacket */ info1 |= 1 << 14; /* toggle from qtd */ info2 |= (EHCI_TUNE_MULT_HS << 30); } else if (type == PIPE_BULK) { + info1 |= (EHCI_TUNE_RL_HS << 28); info1 |= 512 << 16; /* usb2 fixed maxpacket */ info2 |= (EHCI_TUNE_MULT_HS << 30); } else { /* PIPE_INTERRUPT */ @@ -799,8 +834,7 @@ && !usb_pipecontrol (urb->pipe)) { /* "never happens": drivers do stall cleanup right */ if (qh->qh_state != QH_STATE_IDLE - && (cpu_to_le32 (QTD_STS_HALT) - & qh->hw_token) == 0) + && qh->qh_state != QH_STATE_COMPLETING) ehci_warn (ehci, "clear toggle dev%d " "ep%d%s: not idle\n", usb_pipedevice (urb->pipe), @@ -839,7 +873,6 @@ __list_splice (qtd_list, qh->qtd_list.prev); ehci_qtd_init (qtd, qtd->qtd_dma); - qtd->hw_alt_next = ehci->async->hw_alt_next; qh->dummy = qtd; /* hc must see the new dummy at list end */ @@ -907,9 +940,12 @@ /* the async qh for the qtds being reclaimed are now unlinked from the HC */ +static void start_unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh); + static void end_unlink_async (struct ehci_hcd *ehci, struct pt_regs *regs) { struct ehci_qh *qh = ehci->reclaim; + struct ehci_qh *next; del_timer (&ehci->watchdog); @@ -920,6 +956,10 @@ ehci->reclaim = 0; ehci->reclaim_ready = 0; + /* other unlink(s) may be pending (in QH_STATE_UNLINK_WAIT) */ + next = qh->reclaim; + qh->reclaim = 0; + qh_completions (ehci, qh, regs); if (!list_empty (&qh->qtd_list) @@ -939,6 +979,9 @@ jiffies + EHCI_ASYNC_JIFFIES); } } + + if (next) + start_unlink_async (ehci, next); } /* makes sure the async qh will become idle */ @@ -951,7 +994,8 @@ #ifdef DEBUG if (ehci->reclaim - || qh->qh_state != QH_STATE_LINKED + || (qh->qh_state != QH_STATE_LINKED + && qh->qh_state != QH_STATE_UNLINK_WAIT) #ifdef CONFIG_SMP // this macro lies except on SMP compiles || !spin_is_locked (&ehci->lock) @@ -983,6 +1027,9 @@ wmb (); if (unlikely (ehci->hcd.state == USB_STATE_HALT)) { + /* if (unlikely (qh->reclaim != 0)) + * this will recurse, probably not much + */ end_unlink_async (ehci, NULL); return; } @@ -1001,25 +1048,28 @@ scan_async (struct ehci_hcd *ehci, struct pt_regs *regs) { struct ehci_qh *qh; - unsigned count; + if (!++(ehci->stamp)) + ehci->stamp++; rescan: qh = ehci->async->qh_next.qh; - count = 0; if (likely (qh != 0)) { do { /* clean any finished work for this qh */ - if (!list_empty (&qh->qtd_list)) { + if (!list_empty (&qh->qtd_list) + && qh->stamp != ehci->stamp) { int temp; /* unlinks could happen here; completion - * reporting drops the lock. + * reporting drops the lock. rescan using + * the latest schedule, but don't rescan + * qhs we already finished (no looping). */ qh = qh_get (qh); + qh->stamp = ehci->stamp; temp = qh_completions (ehci, qh, regs); qh_put (ehci, qh); if (temp != 0) { - count += temp; goto rescan; } } diff -Nru a/drivers/usb/hcd/ehci.h b/drivers/usb/hcd/ehci.h --- a/drivers/usb/hcd/ehci.h Thu Feb 20 12:07:09 2003 +++ b/drivers/usb/hcd/ehci.h Thu Feb 20 12:07:09 2003 @@ -81,6 +81,7 @@ struct pci_pool *sitd_pool; /* sitd per split iso urb */ struct timer_list watchdog; + unsigned stamp; #ifdef EHCI_STATS struct ehci_stats stats; @@ -235,12 +236,12 @@ /* the rest is HCD-private */ dma_addr_t qtd_dma; /* qtd address */ struct list_head qtd_list; /* sw qtd list */ - - /* dma same in urb's qtds, except 1st control qtd (setup buffer) */ struct urb *urb; /* qtd's urb */ size_t length; /* length of buffer */ } __attribute__ ((aligned (32))); +#define QTD_MASK cpu_to_le32 (~0x1f) /* mask NakCnt+T in qh->hw_alt_next */ + /*-------------------------------------------------------------------------*/ /* type tag from {qh,itd,sitd,fstn}->hw_next */ @@ -304,13 +305,17 @@ union ehci_shadow qh_next; /* ptr to qh; or periodic */ struct list_head qtd_list; /* sw qtd list */ struct ehci_qtd *dummy; + struct ehci_qh *reclaim; /* next to reclaim */ atomic_t refcount; + unsigned stamp; u8 qh_state; #define QH_STATE_LINKED 1 /* HC sees this */ #define QH_STATE_UNLINK 2 /* HC may still see this */ #define QH_STATE_IDLE 3 /* HC doesn't see this */ +#define QH_STATE_UNLINK_WAIT 4 /* LINKED and on reclaim q */ +#define QH_STATE_COMPLETING 5 /* don't touch token.HALT */ /* periodic schedule info */ u8 usecs; /* intr bandwidth */