Subject: Re: wd0 intermittent disk errors (correctable soft-errors, DMA error: missing interrupt, etc.)
To: None <davef1624@aol.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 08/24/2005 00:02:00
On Tue, Aug 23, 2005 at 05:25:22PM -0400, davef1624@aol.com wrote:
> 
> I have a 2 GHz, Pentium-4 based system, using 40 GB Hitachi Travelstars 
> IDE disks.
> 
> We are seeing the following errors intermittently on the system:
> 
> wd0a: error reading fsbn 512864 of 512864-512991 (wd0 bn 512864; cn 508 
> tn 12 sn 44), retrying
> wd0: (aborted command, interface CRC error)
> wd0: soft error (corrected)

This is harmless as long as it doesn't occur often. This means that
the data got corrupted during transfers on the IDE bus, and this was
detected by the Ultra-DMA CRC function (in this case the driver just redo the
transfers). It's expected to see occasionnal CRC errors on Ultra-DMA IDE
busses, this bus just can't do reliable data transmission at this speed
(PATA Ultra-DMA could be called a hardware hack :)

> 
> In addition, we sometimes see the following disk/driver errors:
> 
> pciide0:1:0: bus-master DMA error: missing interrupt, status=0x20
> pciide0:1:0: device timeout, c_bcount=8192, c_skip0
> pciide0 channel 1: reset failed for drive 0
> wd0a: device timeout writing fsbn 8236512 of 8236512-8236527 (wd0 bn 
> 8236512; cn 8171 tn 2 sn 18), retrying
> pciide0:1:0: not ready, st=0x80, err=0x00
> pciide0 channel 1: reset failed for drive 0
> wd0a: device timeout writing fsbn 8236512 of 8236512-8236527 (wd0 bn 
> 8236512; cn 8171 tn 2 sn 18), retrying
> pciide0:1:0: not ready, st=0x80, err=0x00
> wd0a: device timeout writing fsbn 8236512 of 8236512-8236527 (wd0 bn 
> 8236512; cn 8171 tn 2 sn 18), retrying

This is more serious, this means the drive is stalled, it doens't
even honnor the reset signal. I guess the drive doesn't recover from this ?
Maybe it's a drive firmware issue, maybe it's just dying ...

I've seen this on occasion on sparc64 system, I suspect it's a read/write
reordering issue on this platform. But I've never seen it on PCs.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--