Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 5.1 RC3 in production



On Sun, Sep 19, 2010 at 12:44:14PM +0700, Robert Elz wrote:
> 
> I'm pretty sure hardware errors don't cause tstile lockups, and while
> anything is possible for the crashes, it would be a huge coincidence if
> some hardware component just happened to break when NetBSD 5 was installed.
> It is certainly not impossible, but it isn't my first guess.

It is much more likely -- if there is a hardware error involved at all --
that NetBSD 5 exhibits problems where NetBSD 4 did not simply because NetBSD
5 stresses the synchronization and coherence facilities provided by the
hardware much more than NetBSD 4 did, even on uniprocessors.

This simple fact revealed hardware issues so severe on early AMD Opterons
(and even some later ones) that we ended up abandoning an entire generation
of hardware for our build cluster, rather then chasing our tails over and
over again trying to figure out which microcode patches were wanted, which
features had to be disabled, and which software workarounds we needed where.

> If it happened often enough that I could expect to catch it, I'd enable
> ddb and find out what happened, but the crashes aren't that frequent,
> the system isn't unusable by any means.   I don't want (can't allow) it
> to crash sometime and sit in ddb for hours while I sleep...

Why not hook up a serial console and just look at what the kernel logs?
Are you not getting kernel cores?

Thor


Home | Main Index | Thread Index | Old Index