Subject: Re: wd interface CRC errors
To: David Maxwell <david@vex.net>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 10/20/2006 10:55:14
David Maxwell writes:
> 
> I have a NetBSD 2.0.2, i386, with an uptime of 238 days. The install
> is about a year and a half old, running 24/7 since May 2005.
> 
> It has three WDs, and wd1/wd2 are a mirror. Nothing has changed
> physically in the system, so the usual 'cable problem' suggestion 
> doesn't seem to apply.

If you havn't already, I'd check that the cables are seated all the way... 
(I've had them work their way out a little over time, and cause these 
sorts of issues...)

> These started in September, and have become common:
> 
> Sep 19 08:58:28 mail /netbsd: wd0a: error writing fsbn 1114304 of 1114304-111
> 4319 (wd0 bn 1114367; cn 1105 tn 8 sn 23), retrying
> Sep 19 08:58:29 mail /netbsd: wd0: (aborted command, interface CRC error)
> Sep 19 08:58:29 mail /netbsd: wd0: soft error (corrected)
> 
> They show up on wd0 and wd1, which share a controller, and a cable. 
> All errors are corrected so far. (18 on wd0, 19 on wd1) 
> 
> All of the errors occur while writing, and there's no locality 
> amongst the sectors invovled in the errored writes.

Any "time-of-day" correlations (e.g. when /etc/daily is running?) 
which might speak to a heavy disk load (and hence power draw), and 
possibly to a power supply that is starting to fail? 
 
> The smart status shows a high count on wd0 for raw read error rate and
> hardware ECC recovered errors, so I'm inclined to replace that drive.

Be careful with these numbers from the SMART info... I've got a few Seagate
drives where the raw read error rate and hardware ECC recovered error rate 
move in lock-step, and at a rate of 6/second (when the drive is idle.  Much 
higher when the drive is active).  I don't have the URLs handy, but 
this is apparently a 'known issue' with some Seagate drives... 

You might run some of the tools from sysutils/smartmontools to see if 
they give any more info (and/or run the SMART diagnostic bits...).

Later...

Greg Oster