Subject: Re: SBC probs
To: Chris Mason <cmason@nando.net>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: port-mac68k
Date: 08/30/1996 10:42:30
> What makes you think it's failing??  This disk is less than 6 months old

Well, to repeat the basic idea, there isn't anything NetBSD can do to cause
media failure reports from the drive; if you've accumulated 20 bad sectors in
6 months, that's much more that the drive ought to generate (a couple per
year is much more reasonable).  One possibility, though, is that if you
ever formatted in such a way as to erase the manufacturer's defect list,
you could be rediscovering all those bad sectors.

> I've been using it regularly for about a month now.  I've done several
> formats on the drive, each followed by what seemed to be extensive media
> checks (sequential _and_ random read _and_ write, it took almost 3 hours),
> which discovered no errors.

Random reads and writes aren't important for verifying the media surface;
they're useful if you don't trust the drive to figure out what cylinder it's
on, though.  Ideally, you want a program that reads and writes each sector with
different data (preferably carefully-chosen test data, but since that will be
different for each disk drive, and potentially for each major revision of
a given disk drive(*), no way will you find a format utility that is optimized
to find defects) multiple times.

> I've tried several: Anubis, SCSI Director, FWB Primer, and an old version
> of PLI Formatter.  Which do you recommend?

Hmm, I've used FWB, Anubis, and SCSI Director, as well as APS Power Tools
(which I think may now be the same underlying formatter as Anubis).  The
folks at APS assure me that their media test function reads sectors multiple
times, but I had one disk drive failure they didn't spot which *was* spotted
by the DiskCheck utility that comes with DiskExpress II (which I highly
recommend if you use your Mac with MacOS).(*)  If "FWB Primer" is a cut-down
version of their full Hard Disk Toolkit, it might not have the same media
certification as the full toolkit, but I've been reasonably impressed with
the Hard Disk Toolkit (however, I don't know for sure what their media
test function does).

> How can I be sure that this is a hardware problem and not software??

Software should not be able to induce media errors.  If the NetBSD driver is
accurately reporting this (it's *possible* that it is misinterpreting some
other error, but I doubt it), then your disk is acquiring bad spots at an
alarming rate.  I don't think NetBSD generates SCSI bus resets, which might
curdle a drive if done at embarassing moments.

One vague possibility, which I don't think I believe:  if the disk drive you
have is drawing more current than the Mac power supply is rated for, then
if NetBSD is exercising the disk harder than MacOS does, then *maybe* you're
seeing occasional power-supply sags that are taking out sectors being written
at the time.  However, most disk drives report that in more dramatic ways than
by leaving curdled sectors on the disk, and the usual time you see this is when
you have an older Mac and a very high capacity disk drive of the same vintage
as said older Mac -- modern disk drives (especially one which is 6 months old)
draw hardly any current by historical standards (:-), and Mac power supplies
are generally more tolerant of system expansion than they were in the toaster
days...


(*) I once had a Maxtor 7245 disk which worked fine under UNIX for 6 months,
but when plugged into a Mac would fail the nightly DiskExpress verify, with
DiskExpress claiming that it successfully read *different data* on subsequent
reads.  I wrote a UNIX utility to simply read every sector of the disk three
times in a row, and sure enough, once in a while the disk would return data
from a different sector.  APS Power Tools failed to discover this problem;
they likely read the entire track at once (multiple times), and this turned out
to be a caching bug.  Just by chance, my employer at the time was having
regular visits from a Maxtor field engineer at the time;  I mentioned the
problem to him, he said "nooo, it couldn't be doing that", and hooked it up
to his official Maxtor test rig.  I swear there is still a dent in the floor
where his jaw hit it...  He called up the engineering department, and they
said they hadn't seen THIS bug, but there had been a vaguely similar caching
bug which was fixed during the product lifecycle, and Fedexed out a new ROM.
The field engineer said that if the ROM didn't fix it, I could return the
drive to Maxtor for a replacement thanks to their "No Quibbles" guarantee,
but the ROM did cure the problem.  (It turns out that at the time, the
7245 drive wasn't being manufactured any more, but they were still shipping
them -- 7345 (245MB capacity versus 345MB capacity) drives with the extra
space disabled in firmware; if I went that route, the FE said he knew what
microcode variables to tweak to uncover the extra space, but again, the ROM
fixed it and the extra space wasn't worth the hassle.)