Subject: Re: Ultra 5 / 2.0 / panic: lockmgr: no context
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Gert Doering <gert@greenie.muc.de>
List: port-sparc64
Date: 01/14/2005 09:43:32
Hi,

On Wed, Jan 12, 2005 at 03:47:06PM +0100, Manuel Bouyer wrote:
> In my case, it was in a air-conditioned machine room, and was not moved.
> It started failing after 2 years of good work (it was running solaris).

I'm afraid your suspicions might be true - the crashes look different
every day, and yesterday I also got a RED STATE EXCEPTION - bah.

Today was interesting.  Machine was unresponsive, but no message at all
on the console.  After a BREAK, I got into the db>, but I'm not really
sure what the "right" way is to figure out what the machine was doing
at this point in time.  "bt" shows me something that looks more like
"you're here because you sent a console <BREAK>":

kdb breakpoint at 1277b68
Stopped at      netbsd:cpu_Debugger+0x4:        nop
db>
db> bt
sab_intr(24c7700, 0, e0017ed0, 0, 12482bc, 768) at netbsd:sab_intr+0xa4
sparc64_ipi_flush_all(0, 0, 12af8a0, 0, ffffffffffffffff, 18aa5c8) at netbsd:sparc64_ipi_flush_all+0x23c
db>

is this assumption correct?

Is there a tutorial around how to get maximum useful information from ddb
at this stage (the "man ddb" man page describes the commands, but I'm new
to kernel debugging, and a bit lost - sorry).

What I'll definitely do at this stage is "open machine, clear out dust,
re-seat DRAM and CPU" and see whether this will magically fix everything.

gert

-- 
USENET is *not* the non-clickable part of WWW!
                                                           //www.muc.de/~gert/
Gert Doering - Munich, Germany                             gert@greenie.muc.de
fax: +49-89-35655025                        gert@net.informatik.tu-muenchen.de