Subject: Re: wd.c crashes/hard errors
To: Dirk Steinberg <steinber@machtnix.ert.rwth-aachen.de>
From: John F. Woods <jfw@ksr.com>
List: current-users
Date: 02/10/1994 12:14:04
> Michael> [various problems with Quantum LPS IDE hard drives
> Michael> deleted]
> Michael> I'd just like to insert a data point here that I have
> Michael> been running 386BSD 0.0 and 0.1, NetBSD 0.8, 0.9 and
> Michael> current on one and then two Conner CP3104 IDE hard drives
> Michael> for about two years now.  And, although I used to get the
> Michael> hang bug occasionally (which was fixed in mid- December
> Michael> with the timeout patches to wd.c), I've never had the
> Michael> damage to my filesystem you guys are talking about.
> Michael> Maybe it's specifically that drive that has problems?

> Maybe. But the drive otherwise has an excellent reputation. I know
> many people, including myself, who have used this drive extensively
> for DOS, Windows, OS/2 and Linux (the latter quite a lot) without any
> problems whatsoever. So it cannot be just the drives fault. This
> explanation is way too simple.

I think he may have meant that particular unit, but I wouldn't necessarily
discount a drive bug.  First, that it works under DOS and Windows is weak
evidence, since they aren't likely to stress the drive as hard (especially
if the Windows driver is a warmed-over DOS driver).  The Linux driver might
managed to avoid tickling a bug that the NetBSD driver hits.

As to whether or not drives have bugs, two stories:  (1) very recently, it was
reported in one of the Mac groups that LaCie (a drive reseller owned by
Quantum) had stopped shipping the ProDrive 340S drives because of a bug that
would cause data corruption.  (2) A few months ago, I took a Maxtor 7245S
drive out of my ancient UNIX system (when it died) and put it into my wife's
Macintosh; the next morning, the background disk optimizer complained that
a sector had failed a read-several-times sanity check, returning different
data on different read operations.  I verified the behavior on the NetBSD
system (read each sector 3 times, sometimes get correct data from a *different*
sector).  The outfit I bought the drive from was skeptical, since their
diagnostic tool (run on the Mac) failed to find any problem; they eventually
agreed to exchange the drive, but as luck would have it, KSR was working with
Maxtor over some problems with the MXT-1240S series, and the Maxtor rep took
a look at the drive, and sure enough, was able to reproduce it.  After chatting
with the developers at Maxtor, they sent a new microcode ROM which fixed the
problem (they hadn't seen my particular problem, but had fixed other problems
in managing the on-disk cache, and figured it was worth a shot).

So, drives can have bugs, different drivers can entirely fail to trigger those
bugs or can stumble over them as frequently as you like.  (I suspect that
APS' disk test program read the disk in huge chunks, avoiding the cache --
a guess bolstered by the fact that it ran a heck of a lot faster than my
test did.)

------------------------------------------------------------------------------