Subject: Re: wd.c crashes/hard errors
To: Burgess, David (TSgt) ~U <BurgessD@J64.STRATCOM.AF.MIL>
From: Dirk Steinberg <steinber@schoenfix.ert.rwth-aachen.de>
List: current-users
Date: 02/11/1994 10:07:15
>>>>> "David" == TSgt <Burgess> writes:
David> I had a similar problem when I tried to install NetBSD
David> after I bought my new IDE drive. I would be working along,
David> and the disk would hiccup. It turns out that I had a bad
David> spot in the swap space. As soon as the system tried to
David> swap out to the swap space, it would trash the block, but
David> it wouldn't notify the system at all. When the page was
David> swapped back in, it would proceed to trash large portions
David> of all kinds of stuff. I have also had a similar problem
David> with a bad SIMM that would lose its mind from time to time.
David> It always amazed me how much of the system could get
David> destroyed in a very short period of time. All it takes is
David> one or two warped pointers that used to look at memory
David> resident disk structures for the system to go away.
David> If I had any suggestions, I would say to look closely at
David> your hard drive. I suspect that, although IDE isn't
David> supposed to have 'bad media', your drive may have a bad
David> spot. I have also (anecdotally, of course) noticed that
I understand the point that you and some other people on this list are
making. But I do not believe that my hardware is defective (at least
not before a gave NetBSD a chance to mess around with it), because I
have used the *same* hardware, even the *same* partition quite
extensively for Linux and *never* had *any* problems. If it *were*
really the drive's fault, but if it only showed up under NetBSD, and
not under DOS, Windows, OS/2 2.1, Linux, NT, Mach, ISC, whatever, I
would still argue that it were a bug in NetBSD, because it would them
be proven that it is *possible*, by correct/tricky/hackerish
programming, to operate this class of hardware (Quantum LPS 240 AT)
without such dramatic failure modes.
David> virtually every one of my hard drive 'hangs' is because of
David> disk re-reads taking the controller too long. If that was
David> truly the reason, that would explain why some people have
David> the problem and others do not. It would be a function of
Independently from the question about the crashes, I do believe this
to be true. What is the "correct" fix for these "controller taking too
long" errors? Increasing WDCNDELAY? How much?
David> the controller (and its ability to remap flakey sectors) or
David> the hard drive internals (depending on their ability to
David> recover) as to whether or not the controller locks up or
David> not. It would also be as random as it appears to be.
David> Also, if your controller remaps bad spots on the fly, it is
David> just possible that the drive may be initializing the
David> replacement with incorrect information from the drive.
David> Of course, this is all about as authoritative as an X-Men
David> comic, but I would like to tell those of you that are
David> having these problems that I too have had them, and have
David> overcome them through 'bad144' and use of MANY bad spots on
David> the disk being identified whether they were really (versus
David> correctably) bad.
David> TSgt Dave Burgess
All in all I find this very strange. I'll try with bad144 the next
time. Does it also take care of these NVHE(tm) (NetBSD Virtual
Hardware Errors) inside the swap partition?
Frustrated,
Dirk
PS: Someone mentioned that FreeBSD has a new wd.c driver. Maybe it
would be worthwhile taking a look at it. Is it a complete
re-design?
-----------------------------------------------------------------------------
Dirk W. Steinberg - RWTH Aachen - Internet email: steinber@ert.rwth-aachen.de
Aachen University of Technology / IS2-Integrated Systems in Signal Processing
Rhein.Westf.Tech.Hochsch. Aachen / Integrierte Systeme der Signalverarbeitung
Templergraben 55 / D-52056 Aachen / phone:+49 241 807879 / fax:+49 241 807631
Home address: Kleikstr. 63, D-52134 Herzogenrath,Germany/phone: +49 2406 7225
------------------------------------------------------------------------------