Subject: SUMMARY: what to do about medium errors
To: None <port-i386@NetBSD.ORG>
From: Anne Bennett <anne@alcor.concordia.ca>
List: port-i386
Date: 08/17/1997 16:30:08
Late last night I wrote:

Anne> I've started seeing medium errors logged for my <QUANTUM, FIREBALL
Anne> ST3.2S, 0F0C> on an Adaptec (aic7880 Single Channel) controller under
Anne> NetBSD-1.2/i386.  [...]
Anne> Is it normal for a brand-new disk to give such errors, or should I
Anne> ask the vendor for a new one?  Should the SCSI code in NetBSD be
Anne> taking care of remapping bad sectors, should the disk be doing it
Anne> automatically, should I be using some program to do it?  

By this morning I had two helpful responses:

Carl  = Carl S Shapiro <cshapiro@sparky.ic.sunysb.edu>
Giles = Giles Lean <giles@nemeton.com.au>

Carl> You can either format (and remap bad sectors) or just verify the
Carl> media for bad sectors throught the Adaptec SCSI BIOS.  Reboot your
Carl> machine, hit control-a right before the card probes for devices on the
Carl> SCSI bus.  Choose something like "SCSI disk utilities" (or something of
Carl> this nature, I can't exactly remeber, as I haven't done this in a while).
Carl> From there you should be able to do the formatting, etc.

Carl> Dunno about the SCSI code handling remapping of bad sectors, but
Carl> I have bought several "new" SCSI disks that were full of bad sectors.


Giles> I'd be going the replacement route if I could.
Giles> 
Giles> The disk should [remap] automatically (unless this is shipped turned
Giles> off by the vendor, which is unlikely).  The disk should remap if:
Giles> 
Giles> o you write to a bad block
Giles> o you read a bad block that the drive tries and tries and tries and
Giles>   eventually manages to read
Giles> 
Giles> I have seen bad block re-mapping code for SCSI drives (might have
Giles> been for FreeBSD) but wouldn't bother with it for a new drive.
Giles> There might be something in the SCSI controller firmware, too,
Giles> depending on the controller.

I went the control-a route, and made the disk remap bad sectors.  While
the O/S had logged errors on only 45 sectors, by the time I had run the
utilities enough times to get three clean runs in a row (11 times!), 318
sectors had been remapped.  I also had problems running the utilities:
on three occasions, the utilities encountered an "unexpected timeout".
The first time this happened, I tried to get things back to normal with
ctl-alt-del, then with reset, then with a brief power cycle;  at first
the disk was not seen by the controller, then it reported "not ready",
and finally it reported a "hardware error".  Leaving the power off for
over a minute got things back to normal (so this is what I did the second
and third times I got the timeout while running the utilities).

In the end, and after a forced fsck and after recovering the files
which were (known to have been) corrupted, things are back to normal.
Thanks to Carl and Giles for getting back to me so quickly, and on a
week-end too.

The disk had been running nicely since I got the machine on July 26,
and logged its first error on Aug 14; the very flaky behaviour I saw
today, couple with 318 bad sectors found, give me a Bad Feeling.  I'll
be contacting the vendor to see about replacing it before it flakes
out totally.

Anne.
-- 
Ms. Anne Bennett, Computing Services, Concordia University, Montreal H3G 1M8
anne@alcor.concordia.ca                                       (514) 848-7606