Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Acer M5229 IDE bugs (esp. on sparc64)
[...sorry for the necro-posting, just happened to get back to this last
night...]
Manuel Bouyer wrote:
On Thu, Feb 14, 2008 at 02:17:47PM -0500, Rafal Boni wrote:
[...]
I see that wdc->dma_status is always 0x04 (WDC_DMAST_UNDER), which is a
synthetic error generated only by pciide_dma_finish(). I'm guessing
that the suspect pciiide_dma_finish() is the one called from wdcintr().
Because the rev of the M1559 IDE controller I have doesn't have a
chan-id register to determine which channel caused an interrupt, for
this chip we end up *always* checking both channels, and the code in
wdcintr() / pciide_dma_finish() looks very suspicious... stop DMA first,
ask questions later.
AFAIK it's not: before stopping the DMA channel, we check if the controller
did interrupt (status & IDEDMA_CTL_INTR).
We have to stop the DMA first, because in some case the DMA engine will still
be active at end of transfer (if the device has less data to return than
requested for example - as the comment says, it's a valid condition for
ATAPI devices).
Ok, in my case the usage shouldn't include any ATAPI to speak of (one
disk on each channel which are part of a RAID-1, an un-used CD-ROM on
one of the channels). If I add a bit of debug to pciide_dma_finish like
the below:
@@ -768,6 +769,12 @@ pciide_dma_finish(v, channel, drive, for
}
if ((status & IDEDMA_CTL_ACT) != 0 && force !=
WDC_DMAEND_ABRT_QUIET) {
+ if (force == WDC_DMAEND_END) {
+ aprint_error("%s:%d:%d: stopping still-busy xfer, "
+ "status=0x%x\n",
+ device_xname(sc->sc_wdcdev.sc_atac.atac_dev),
+ channel, drive, status);
+ }
I see pretty frequent messages like:
aceride0:1:0: stopping still-busy xfer, status=0x65
or
aceride0:0:0: stopping still-busy xfer, status=0x25
during the RAID parity rebuild after a dirty reboot (in this case I
forced a dirty reboot just to test). Note that channel 1 is the one
with the CDROM. Note that this happens other times besides the parity
rebuild, that's just the easiest way to guarantee that I'll get them.
Another interesting thing is a hack I took from OpenBSD to not skip
channels that don't have the WDCF_IRQ_WAIT flag set in pciide_pci_intr()
[1] seemed to make the controller behave better -- much fewer of the DMA
errors with status WDC_DMAST_UNDER, and in fact the interface downgraded
to Ultra/33 (from /66 originally) and then produced no further errors.
That last bit does make me wonder if this is really a confluence of two
things -- some generic interrupt / DMA handling error, along with either
a setup bug for Ultra/66 mode or the inability of the chip to handle
Ultra/66 transfers on both channels. However, as I said before the
FreeBSD fix for ATA66 byte-count-something-or-other didn't help here.
If this chip asserts IDEDMA_CTL_INTR supriously, and we need to check
IDEDMA_CTL_ACT instead, then it's broken. So it needs a private intr
routine, and it needs to disable DMA for ATAPI devices.
From the above debug code I added, it does look like the interrupt is
getting there early, so maybe this needs to be done. However, I hate
adding code like that without having any other platform with this
god-forsaken chip in it to test on.
--rafal
[1]
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/pciide.c.diff?r1=1.266&r2=1.267&f=h
Home |
Main Index |
Thread Index |
Old Index