Subject: re: wd.c crashes/hard errors
To: None <current-users@sun-lamp.cs.berkeley.edu>
From: Douglas Crosher <dtc@stan.xx.swin.OZ.AU>
List: current-users
Date: 02/10/1994 12:53:11
>Date: Wed, 9 Feb 94 13:50:33 +0100
>From: steinber@machtnix.ert.rwth-aachen.de (Dirk Steinberg)
>Message-Id: <9402091250.AA06233@machtnix.ert.rwth-aachen.de>
>To: current-users@sun-lamp.cs.berkeley.edu
>Subject: wd.c crashes/hard errors
>
>Hi,
>
>yesterday it happened again: I was running a current-940207 system &
>kernel. I have a Quantum LPS 240 AT hard disk and since this one had
>problems with the -current wd.c, I doubled the WDCNDELAY (all this is
>from memory, for reasons that will become apparent soon) value (this
>was suggested some time ago on this list. So during normal operation,
>suddenly the kernel hung with repeated messages like this:
>
>wdc0: busy too long, resetting
>wdc0: busy too long, resetting
>wdc0: busy too long, resetting
>...

	I run NetBSD0.9 and had a Quantum LPS 240 AT connected as the
second disk with a WD 340 as the first.  The machine run for weeks
without a problem then suddenly I started getting the above problem.
The root partition was trashed (which was on the WD340), the other
partitions were OK.  I do not get this problem running either of the
drives alone, so am now just using the WD340.

>
>As a side note, I am observing extra interrupts every so often. I
>always get one directly after (or maybe during?) the autoconfig phase:
>
>wdc0: extra interrupt

Yes I get these also.

>
>I already had these types of crashes before, and every time a
>filesystem was damaged so badly that fsck couldn't repair it. This
>time it was the root filesystem...
>
>Even worse, when checking the fs after reboot, fsck hangs the system
>after:
>
>wd0a: hard error reading fsbn 10720 of 10720-10723

Yes I got this too , I find this very strange as the IDE drive should
not give hard errors? Typically the damage to the root partition was
so bad that I could not reboot!

>
>This error is persistent across reboots, power off, etc. Now since I
>have a IDE disk I shoudn't get hard errors. I never had any hard
>errors before, and my Linux partition still works fine. So my NetBSD
>installation is hosed for now. I sure hope this error goes away when I
>reinstall/re-mkfs. Is it actually possible that the faulty wd.c caused
>damage to my disk, or that it at least screwed up the low-level format
>on some track? If so, how could I reformat a single track without
>reformatting the entire disk? And how to format (low-level) a IDE disk
>in the first place? I know how it works for MFM/RLL/ESDI and SCSI
>disks and have done this many times before. But IDE disks?
>

	I was able to restore my system by doing a disklabel, and
putting a clean fs on the root partition, then reinstalling all the
file that were on that partition.  The disk errors did not re-appear
till a few days later when the f..ken thing crashed again.  This time
I removed the second drive and things have been fine.

Regards
Douglas Crosher

------------------------------------------------------------------------------