Subject: Re: seg fault on install
To: Julio Merino <jmmv@hispabsd.org>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 05/07/2002 14:00:29
[ On Tuesday, May 7, 2002 at 17:27:37 (+0200), Julio Merino wrote: ]
> Subject: Re: seg fault on install
>
> On Tue, May 07, 2002 at 03:35:27PM +0200, Thomas Runge wrote:
> > 
> > Smells a bit like the mystical Sig11 problem:
> > 
> > http://www.bitwizard.nl/sig11/
> 
> Gggrrr can't connect.

If/when you do I think you'll find that the only real fix is to upgrade
your system and/or memory to ECC (and make sure your motherboard does
the right thing to allow the OS to differentiate between corrected and
uncorrected errors).

> If memory was faulty, it should fail more usually... and my -current system
> is quite stable (despite some problems with X).

Memory problems are quite fickle.  People who haved claimed their
machine passes ever test under the sun, and who say they have never had
a core dump or other unexpected behaviour from any other program will
suddenly start complaining about some large CPU+RAM hog that dies
without explanation, and then might run to completion without error on
the very same job....

I have a more-or-less no-name Pentium-Pro machine with 36-bit RAM and
which supposedly has full ECC support (and it's enabled in the BIOS).
It passes memtest even when it's run for many hours.  However I
occasionally have compiler failures that when re-run again work just
fine.  Only on extremely rare occasions does the machine drop down into
the debugger on an NMI, but I don't know enough about the hardware to be
able to figure out whether it's a corrected or an uncorrected error.
I've never had an NMI though when any other program has suffered
unexplained failures....

Now maybe the program failures are due to massive mult-bit failures that
the ECC hardware fails to notice, but I seriously doubt I could have
more such failures while not at the same time at least seeing more
"corrected" errors.  Maybe I'm not seeing any corrected errors though --
maybe the NMI is only for uncorrected errors?

I _HATE_ PC-crap hardware!  ;-)  (even when it's constructed to look
like a real sever!  :-)

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>