tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
SunFire v100 / Acer M5229 IDE DMA error workaround
Folks:
I've been taunted by the IDE interface on my SunFire V100 for a long
(LOOONG!) time with messages along the lines of:
wdNN: DMA error writing fsbn xxxx of xxxx-yyy (wdN bn pppp; cn ccc tn
tt sn ss), retrying
wdN: soft error (corrected)
This box is running RAIDFrame over 2 110GB IDE drives, one on each
channel. While the errors have not caused any data loss, they do
eventually cause the IDE subsystem downgrade to slower-and-slower
DMA modes and eventually even to PIO access to the disks. See [1]
and the messages in that thread for my prior attempts at getting
rid of these errors.
I've since tried a bunch more stuff, and found that none of the
aceride-specific changes made any real difference. It looks like
for whatever reason, the chip asserts interrupts before the DMA
is complete, or the PCI IDE code at least believed that was the
case. So last night, after looking at the FreeBSD and OpenBSD
IDE code, I came up with the following set of changes, which
so far has not had any negative consequences on the system and
has survived a complete RAID parity rebuild (this was the one
case where I *always* got the DMA errors) without spewing a
single IDE DMA-related error. It also makes the box feel a
bit snappier, but maybe I'm just imagining that ;)
I realize this change is probably done in the wrong place -- I
should probably have created a aceride-specific dma_finish method
and done the checks there, but this is at least a proof-of-concept
that the change works; the change also includes some more debug
logging in the case of DMA errors, which aren't necessary to fix
the issue but helped me diagnose it, so I've left them in for now.
Finally, I know Manuel mentioned that doing something along these
lines would likely have an impact on ATAPI DMA operations, and I
have not tested it with anything beyond ATA disk -- however, I'm
not sure that ATAPI DMA ever worked on my V100 -- I think it always
falls back to PIO, at least with the CDROM in the system.
Patch below... I'd love comments / feedback, esp. on ATAPI use cases,
--rafal
---8<------8<------8<------8<------8<------8<------8<------8<------8<---
Index: ata/ata_wdc.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ata/ata_wdc.c,v
retrieving revision 1.90
diff -u -p -r1.90 ata_wdc.c
--- ata/ata_wdc.c 2 Oct 2008 21:05:17 -0000 1.90
+++ ata/ata_wdc.c 29 Oct 2008 15:30:20 -0000
@@ -681,6 +681,10 @@ wdc_ata_bio_intr(struct ata_channel *chp
}
if (wdc->dma_status != 0) {
if (drv_err != WDC_ATA_ERR) {
+ printf("%s:%d:%d: DMA error (st=0x%x,
er=0x%x)\n",
+ device_xname(atac->atac_dev),
+ chp->ch_channel, xfer->c_drive,
+ wdc->dma_status, ata_bio->r_error);
ata_bio->error = ERR_DMA;
drv_err = WDC_ATA_ERR;
}
Index: pci/pciide_common.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/pciide_common.c,v
retrieving revision 1.38
diff -u -p -r1.38 pciide_common.c
--- pci/pciide_common.c 18 Mar 2008 20:46:37 -0000 1.38
+++ pci/pciide_common.c 29 Oct 2008 15:30:21 -0000
@@ -737,7 +738,9 @@ pciide_dma_finish(v, channel, drive, for
ATADEBUG_PRINT(("pciide_dma_finish: status 0x%x\n", status),
DEBUG_XFERS);
- if (force == WDC_DMAEND_END && (status & IDEDMA_CTL_INTR) == 0)
+ /* XXXrkb: From FreeBSD; should probably add an evcnt here */
+ if (force == WDC_DMAEND_END &&
+ ((status & (IDEDMA_CTL_INTR | IDEDMA_CTL_ACT)) != IDEDMA_CTL_INTR))
return WDC_DMAST_NOIRQ;
/* stop DMA channel */
@@ -752,6 +755,9 @@ pciide_dma_finish(v, channel, drive, for
BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(sc->sc_dmat, dma_maps->dmamap_xfer);
+ /* Clear status bits */
+ bus_space_write_1(sc->sc_dma_iot, cp->dma_iohs[IDEDMA_CTL], 0, status);
+
if ((status & IDEDMA_CTL_ERR) != 0 && force != WDC_DMAEND_ABRT_QUIET) {
aprint_error("%s:%d:%d: bus-master DMA error: status=0x%x\n",
device_xname(sc->sc_wdcdev.sc_atac.atac_dev), channel,
@@ -768,6 +774,12 @@ pciide_dma_finish(v, channel, drive, for
}
if ((status & IDEDMA_CTL_ACT) != 0 && force != WDC_DMAEND_ABRT_QUIET) {
+ if (force == WDC_DMAEND_END) {
+ aprint_error("%s:%d:%d: stopping still-busy xfer, "
+ "status=0x%x\n",
+ device_xname(sc->sc_wdcdev.sc_atac.atac_dev),
+ channel, drive, status);
+ }
/* data underrun, may be a valid condition for ATAPI */
error |= WDC_DMAST_UNDER;
}
---8<------8<------8<------8<------8<------8<------8<------8<------8<---
[1] http://mail-index.netbsd.org/port-sparc64/2008/02/14/msg000101.html
--
Time is an illusion; lunchtime, doubly so. |/\/\| Rafal Boni
-- Ford Prefect |\/\/|
rafal%pobox.com@localhost
Home |
Main Index |
Thread Index |
Old Index