tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SunFire v100 / Acer M5229 IDE DMA error workaround



On Wed, Oct 29, 2008 at 12:05:39PM -0400, Rafal Boni wrote:
> Folks:
>       I've been taunted by the IDE interface on my SunFire V100 for a long
>       (LOOONG!) time with messages along the lines of:
> 
>       wdNN: DMA error writing fsbn xxxx of xxxx-yyy (wdN bn pppp; cn ccc tn 
> tt sn ss), retrying
>       wdN: soft error (corrected)
> 
>       This box is running RAIDFrame over 2 110GB IDE drives, one on each
>       channel.  While the errors have not caused any data loss, they do
>       eventually cause the IDE subsystem downgrade to slower-and-slower
>       DMA modes and eventually even to PIO access to the disks.  See [1]
>       and the messages in that thread for my prior attempts at getting
>       rid of these errors.
> 
>       I've since tried a bunch more stuff, and found that none of the
>       aceride-specific changes made any real difference.  It looks like
>       for whatever reason, the chip asserts interrupts before the DMA
>       is complete, or the PCI IDE code at least believed that was the
>       case.  So last night, after looking at the FreeBSD and OpenBSD
>       IDE code, I came up with the following set of changes, which 
>       so far has not had any negative consequences on the system and
>       has survived a complete RAID parity rebuild (this was the one
>       case where I *always* got the DMA errors) without spewing a
>       single IDE DMA-related error.  It also makes the box feel a
>       bit snappier, but maybe I'm just imagining that ;)
> 
>       I realize this change is probably done in the wrong place -- I
>       should probably have created a aceride-specific dma_finish method
>       and done the checks there, but this is at least a proof-of-concept
>       that the change works; the change also includes some more debug
>       logging in the case of DMA errors, which aren't necessary to fix
>       the issue but helped me diagnose it, so I've left them in for now.
> 
>       Finally, I know Manuel mentioned that doing something along these
>       lines would likely have an impact on ATAPI DMA operations, and I
>       have not tested it with anything beyond ATA disk -- however, I'm
>       not sure that ATAPI DMA ever worked on my V100 -- I think it always
>       falls back to PIO, at least with the CDROM in the system.
> 
> Patch below... I'd love comments / feedback, esp. on ATAPI use cases,
> --rafal
> 
> ---8<------8<------8<------8<------8<------8<------8<------8<------8<---
> Index: pci/pciide_common.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/pci/pciide_common.c,v
> retrieving revision 1.38
> diff -u -p -r1.38 pciide_common.c
> --- pci/pciide_common.c       18 Mar 2008 20:46:37 -0000      1.38
> +++ pci/pciide_common.c       29 Oct 2008 15:30:21 -0000
> @@ -737,7 +738,9 @@ pciide_dma_finish(v, channel, drive, for
>       ATADEBUG_PRINT(("pciide_dma_finish: status 0x%x\n", status),
>           DEBUG_XFERS);
>  
> -     if (force == WDC_DMAEND_END && (status & IDEDMA_CTL_INTR) == 0)
> +     /* XXXrkb: From FreeBSD; should probably add an evcnt here */
> +     if (force == WDC_DMAEND_END && 
> +         ((status & (IDEDMA_CTL_INTR | IDEDMA_CTL_ACT)) != IDEDMA_CTL_INTR))
>               return WDC_DMAST_NOIRQ;

I have a hunch that this is not necessary.  After you introduce the new
bus_space_write_1() call, below, does the condition IDEDMA_CTL_INTR &&
IDEDMA_CTL_ACT ever occur?

>       /* stop DMA channel */
> @@ -752,6 +755,9 @@ pciide_dma_finish(v, channel, drive, for
>           BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
>       bus_dmamap_unload(sc->sc_dmat, dma_maps->dmamap_xfer);
>  
> +     /* Clear status bits */
> +     bus_space_write_1(sc->sc_dma_iot, cp->dma_iohs[IDEDMA_CTL], 0, status);
> +

I may be missing something, but by my reading of a PCI IDE controller
spec that I scrounged off the web, it is important to acknowledge the
interrupt in this way.  ISTM that the code should already acknowledge
the interrupt by calling pciide_irqack().  Not so?

Note that this write may not be flushed to the device, and the
interrupt deasserted, until a second call to pciide_dma_finish() calls
bus_space_read_1(, cp->dma_iohs[IDEDMA_CTL], ).  In other words, you
may take two interrupts per DMA completed.

Dave

-- 
David Young             OJC Technologies
dyoung%ojctech.com@localhost      Urbana, IL * (217) 278-3933 ext 24


Home | Main Index | Thread Index | Old Index