Subject: Re: Ultra 5 / 2.0 / panic: lockmgr: no context
To: Gert Doering <gert@greenie.muc.de>
From: Eduardo Horvath <eeh@NetBSD.org>
List: port-sparc64
Date: 01/14/2005 19:54:06
On Fri, Jan 14, 2005 at 09:43:32AM +0100, Gert Doering wrote:
> Hi,
> 
> On Wed, Jan 12, 2005 at 03:47:06PM +0100, Manuel Bouyer wrote:
> > In my case, it was in a air-conditioned machine room, and was not moved.
> > It started failing after 2 years of good work (it was running solaris).
> 
> I'm afraid your suspicions might be true - the crashes look different
> every day, and yesterday I also got a RED STATE EXCEPTION - bah.
> 
> Today was interesting.  Machine was unresponsive, but no message at all
> on the console.  After a BREAK, I got into the db>, but I'm not really
> sure what the "right" way is to figure out what the machine was doing
> at this point in time.  "bt" shows me something that looks more like
> "you're here because you sent a console <BREAK>":
> 
> kdb breakpoint at 1277b68
> Stopped at      netbsd:cpu_Debugger+0x4:        nop
> db>
> db> bt
> sab_intr(24c7700, 0, e0017ed0, 0, 12482bc, 768) at netbsd:sab_intr+0xa4
> sparc64_ipi_flush_all(0, 0, 12af8a0, 0, ffffffffffffffff, 18aa5c8) at netbsd:sparc64_ipi_flush_all+0x23c
> db>
> 
> is this assumption correct?

Yes.  The sab_intr is almost certainly due to the BREAK.

I'm a bit surprized to see sparc64_ipi_flush_all() on the stack.  It might
be worth checking whether the processor was really there before the BREAK
or there's a symbol lookup issue and the processor's actually in some other
routine.  If it is in sparc64_ipi_flush_all(), it could have gotten into some
sort of infinite loop there...

> Is there a tutorial around how to get maximum useful information from ddb
> at this stage (the "man ddb" man page describes the commands, but I'm new
> to kernel debugging, and a bit lost - sorry).

Take a look at the machine dependent debug commands.  (At DDB type `mach' 
or look at .../sparc64/db_interface.c)  They are extremely useful.  In 
this case what you should try to do is track down and dump the 
trapframes.  Pointers to the trapframes are passed to the C language
trap handlers from the assembly stubs, unfortunately the location of that
parameter varies depending on the particular handler.  (I was planning
to normalize the trap handler call signatures at some point but never
found the time.)

Sun used to sell a book called _Panic!_ by Chris Drake that describes
the art of crash dump analysis on 32-bit sparc.  It describes techniques
to use for SPARC v8 on Solaris with adb, but most of them can also be
applied to SPARC v9 with DDB if you know the differences in architecture
and command syntax.


Eduardo