Subject: Re: Crashes under "load".
To: Frank van der Linden <fvdl@netbsd.org>
From: Richard Rauch <rkr@olib.org>
List: port-amd64
Date: 06/16/2004 16:01:58
On Wed, Jun 16, 2004 at 10:01:42PM +0200, Frank van der Linden wrote:
> On Wed, Jun 16, 2004 at 02:00:58PM -0500, Richard Rauch wrote:
> > I have also occasionally noticed that builds from pkgsrc or system sources
> > abort with either unexplained coredumps (resuming works fine) or else
> > abort with strange, garbled "no such file" messages.  (The garble implying
> > to me that the file names are corrupted somewhere.)
> 
> Hm.. that sounds like bad memory..
Perhaps.  Things were rock solid for a long time, then I upgraded
BIOS and NetBSD concurrently and reliabily became awful.
Reverting BIOS fixed most of it.  I haven't tried locating a past
version of NetBSD that is more reliable to see if there is anything
on that front.
> In any case, what I was talking about lately were the spontaneous
> resets (no DDB, no crashdumps). I believe that they are fixed now.
Sometimes the system just reboots.  Sometimes it freezes having
synced disks and waits for me to press a key before rebooting.
I haven't found any crashdumps (I have never had one; I assume
that they would be in /var/crash if I had any, yes?).
At some points in the past, I believe that I've had to resort to
the reset button or toggling the power, but not recently.
> Your problem smells a bit like a hardware problem. I have done
> some fairly heavy compiling over NFS without problems, I'll try so
> again.
I can't rule it out, obviously.  As for NFS, that was just an off-handed
thought.
As I noted elsewhere, LINUX claims to have identified an "errata #93"
in the (K8) motherboard which the kernel thinks can/should be fixed by
a BIOS upgrade.  This is true, even with with the latest BIOS I have
been able to get.  But that's been gone over before, so I only mention
it as a reference in case it is related.
As I've also noted before, the system is practically useless if
ioapic is enabled.  I don't know if that has been fixed recently, but
it may be worth building a GENERIC kernel and seeing if that issue
has been closed...  The NFS mount, if this is possibly related, is
over an Intel fxp-using NIC that is unusable with ioapic enabled
(but which appears to function fully normally with ioapic disabled).
Is it possible that some of my problems come from being forced to
disable ioapic?
-- 
  "I probably don't know what I'm talking about."  http://www.olib.org/~rkr/