Subject: Re: kgmon -b causes a reboot (no kernel profiling)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Gary Thorpe <gathorpe79@yahoo.com>
List: current-users
Date: 09/07/2006 22:31:38
--- Bill Studenmund <wrstuden@netbsd.org> wrote:

> On Thu, Aug 31, 2006 at 11:09:47PM -0400, Gary Thorpe wrote:
> > Hi,
> > 
> > I was recently attempting to profile a kernel under different
> > conditions. However, each time I try to enable profiling using
> 'kgmon
> > -b' the machine reliably reboots (no ddb or panic).
> > 
> > At first, I though it was because 'kgmon' was from 3.0, but the
> current
> > version produces the same result as well. Is this just something
> wrong
> > with my source tree/actions or is this universally repeatable?
> 
> No idea about repro, but I'm going to guess that the problem is that
> a
> routine is getting profiled that shouldn't. There are a few routines
> which
> aren't profiled even when you're profiling. They are the routines
> involved
> in profiling itself.
> 
> So my guess is that a routine called as part of profiling is getting 
> profiled, which triggers a recursion, which makes the stack explode,
> which 
> can cause the box to just reboot.
> 
> A main problem is that anything to fix this, such as a stack guard
> page, 
> will trigger uvm code which is itself profiled, which will continue
> the 
> recursion.
> 
> So the only suggestions I can come up with are: 1) make sure your
> source 
> tree is clean, 2) look at the call graph for profiling routines and
> see if 
> one of the routines in the graph is not marked as "no profiling", and
> 3) 
> try a date-based checkout to see when the change happened & examine
> the 
> change that killed things.
> 
> Good luck!
> 
> Take care,
> 
> Bill

Thanks for responding.

I built a kernel for another machine (with the same source tree, but a
different configuration) which doesn't reboot when you enable
profiling. My guess was 1) because doing that required a new (clean)
tree. However, it still reboots after rebuilding with a clean tree. [Is
there a short list (i.e. most probable) of times when one should clean
the tree in between kernel builds (always for current)?]

So this problem may be either specific to this kernel configuration or
this particular machine.

About getting a call graph for 2): is their a #define that marks "no
profiling" (or is it done some other way)?

For 3), since it does not reboot with 3.0, should that branch point be
my starting point (that's a bit far back :-( )? Should I also try
varying the configuration file to see if it only happens with certain
options (e.g. stripping it down to the bare minimal and then gradually
testing more options)?

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com