Subject: Re: wd.c crashes/hard errors
To: Burgess, David (TSgt) ~U <BurgessD@J64.STRATCOM.AF.MIL>
From: Dirk Steinberg <>
List: current-users
Date: 02/11/1994 10:07:15
>>>>> "David" == TSgt  <Burgess> writes:

    David> I had a similar problem when I tried to install NetBSD
    David> after I bought my new IDE drive.  I would be working along,
    David> and the disk would hiccup.  It turns out that I had a bad
    David> spot in the swap space.  As soon as the system tried to
    David> swap out to the swap space, it would trash the block, but
    David> it wouldn't notify the system at all.  When the page was
    David> swapped back in, it would proceed to trash large portions
    David> of all kinds of stuff.  I have also had a similar problem
    David> with a bad SIMM that would lose its mind from time to time.
    David> It always amazed me how much of the system could get
    David> destroyed in a very short period of time.  All it takes is
    David> one or two warped pointers that used to look at memory
    David> resident disk structures for the system to go away.

    David> If I had any suggestions, I would say to look closely at
    David> your hard drive.  I suspect that, although IDE isn't
    David> supposed to have 'bad media', your drive may have a bad
    David> spot.  I have also (anecdotally, of course) noticed that

I understand the point that you and some other people on this list are
making. But I do not believe that my hardware is defective (at least
not before a gave NetBSD a chance to mess around with it), because I
have used the *same* hardware, even the *same* partition quite
extensively for Linux and *never* had *any* problems. If it *were*
really the drive's fault, but if it only showed up under NetBSD, and
not under DOS, Windows, OS/2 2.1, Linux, NT, Mach, ISC, whatever, I
would still argue that it were a bug in NetBSD, because it would them
be proven that it is *possible*, by correct/tricky/hackerish
programming, to operate this class of hardware (Quantum LPS 240 AT)
without such dramatic failure modes.

    David> virtually every one of my hard drive 'hangs' is because of
    David> disk re-reads taking the controller too long.  If that was
    David> truly the reason, that would explain why some people have
    David> the problem and others do not.  It would be a function of

Independently from the question about the crashes, I do believe this
to be true. What is the "correct" fix for these "controller taking too
long" errors? Increasing WDCNDELAY? How much?

    David> the controller (and its ability to remap flakey sectors) or
    David> the hard drive internals (depending on their ability to
    David> recover) as to whether or not the controller locks up or
    David> not.  It would also be as random as it appears to be.
    David> Also, if your controller remaps bad spots on the fly, it is
    David> just possible that the drive may be initializing the
    David> replacement with incorrect information from the drive.

    David> Of course, this is all about as authoritative as an X-Men
    David> comic, but I would like to tell those of you that are
    David> having these problems that I too have had them, and have
    David> overcome them through 'bad144' and use of MANY bad spots on
    David> the disk being identified whether they were really (versus
    David> correctably) bad.

    David> TSgt Dave Burgess

All in all I find this very strange. I'll try with bad144 the next
time. Does it also take care of these NVHE(tm) (NetBSD Virtual
Hardware Errors) inside the swap partition?



PS: Someone mentioned that FreeBSD has a new wd.c driver. Maybe it
    would be worthwhile taking a look at it. Is it a complete

Dirk W. Steinberg - RWTH Aachen - Internet email:
Aachen University of Technology / IS2-Integrated Systems in Signal Processing
Rhein.Westf.Tech.Hochsch. Aachen / Integrierte Systeme der Signalverarbeitung
Templergraben 55 / D-52056 Aachen / phone:+49 241 807879 / fax:+49 241 807631
Home address: Kleikstr. 63, D-52134 Herzogenrath,Germany/phone: +49 2406 7225