Subject: Re: HDD - SMART status
To: None <current-users@netbsd.org>
From: Juha Hyttinen <jthyttin@lce.hut.fi>
List: current-users
Date: 04/28/2003 16:12:47
On Sun, 27 Apr 2003, Manuel Bouyer wrote:

> On Sat, Apr 26, 2003 at 08:45:49AM -0600, Jim Bernard wrote:
> > > On Sat, Apr 26, 2003 at 01:54:16AM +0200, Tomasz Luchowski wrote:
> > > > Hi,
> > > > in syslog:
> > > > 
> > > > Apr 26 01:42:48 zunpc /netbsd: pciide0:0:0: lost interrupt
> > > > Apr 26 01:43:00 zunpc /netbsd:  type: ata tc_bcount: 16384 tc_skip: 0
> > > > Apr 26 01:43:00 zunpc /netbsd: pciide0:0:0: bus-master DMA error: missing interr
> > > > upt, status=0x21
> > > > Apr 26 01:43:00 zunpc /netbsd: wd0e: DMA error reading fsbn 13069678 of 13069678
> > > > -13069709 (wd0 bn 13069741; cn 12966 tn 0 sn 13), retrying
> > > > Apr 26 01:43:00 zunpc /netbsd: wd0: soft error (corrected)

> The error reported by Tomasz is a DMA protocol error between the drive and
> the host. The drive itself didn't report an error (so maybe it could read
> the data fine), but the DMA engine said it failed to transfer the data
> from the drive. This is a problem on the bus.

FWIW, we got error messages like below, when using a "slim" Maxtor HD in a
custom server built by our vendor. By slim I mean it's height was only
about 2/3 of a regular HD, which unfortunately means weaker "frame" for
the HD. It was nothing fancy, just a 20GB for the system.

The problem was related to overtightened screws on the hard disk, since
when I opened and retightened the screws myself (they were really tight),
the error messages reduced significantly (but didn't go away
entirely).

The HD was inside 3Ware "ATA Cage", which propably made the problem worse
(it's a lot more robust than a PC chassis, so overtightening bends the HD
more).

	http://www.3ware.com/products/ata.asp

We didn't push the matter any further, just replaced the HD with "a
regular one", and after that things got back to normal. No more errors
since. I'm happy to provide more details if requested (although I sent
this mail in purely informational purposes ;)

(please take into account, that some lines may get wrapped)

Feb 22 08:31:16 tereus /netbsd: pciide0:0:0: lost interrupt
Feb 22 08:31:17 tereus /netbsd: type: ata tc_bcount: 8192 tc_skip: 0
Feb 22 08:31:17 tereus /netbsd: pciide0:0:0: bus-master DMA error: missing interrupt, status=0x20
Feb 22 08:31:17 tereus /netbsd: pciide0:0:0: device timeout,  c_bcount=8192, c_skip0
Feb 22 08:31:17 tereus /netbsd: wd0h: device timeout writing fsbn 8408864 of 8408864-0 (wd0 bn 44060816; cn 43711 tn 2 sn 2), retrying
Feb 22 08:31:17 tereus /netbsd: wd0: soft error (corrected)


-- 
Juha Hyttinen
LCE sysadmin group
cell +358 50 35 35 457