Subject: Kernel profiling - solved? (or where, oh where has nullproc gone?)
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Michael L. Hitch <mhitch@lightning.oscs.montana.edu>
List: port-mips
Date: 02/27/1999 11:37:10
On Tue, 23 Feb 1999, Jonathan Stone wrote:

> My hunch is the problem area is somehwere in the
> cpu_switch()/idle()/switch_exit()/sw1 machinery, since I consistently
> saw traps immediately after the asm() label in tsleep() -- when
> running on proc0, so it's probably setting a process up to be reaped.

  This is pretty close to what's wrong...

  I belive the problem is Castor's removal of using nullproc for
switch_exit().  Switch_exit() is now using the kernel stack for proc0 -
but proc0 is an active process.  It's a very bad idea to have two
processes/threads using the same stack!

  Upon startup, after mach_init() returns, the kernel stack is changed to
use the proc0paddr stack, and main() is called.  After main() has forked
the init process and started the pagedaemon and reaper threads, it calls
uvm_scheduler(), which ends up in tsleep() - using the proc0paddr stack.

  Later, switch_exit() gets called and switches to the same proc0paddr
stack.  This has normally worked so far because exit2() has not used
enough of the proc0 stack to overwrite the critical portion of the stack
frame used by uvm_scheduler()/tsleep()/mi_switch()/cpu_switch().  Adding
kernel profiling appears to increase the stack usage just enough that
the exit2() patch now clobbers something critical for uvm_scheduler().
The next time uvm_scheduler() exits (or attempts to exit) out of it's
tsleep(), the stack has been corrupted.

  I'm now running a fully-profiled kernel by calling main() using the
first half of proc0's u-area for the initial stack, and using the second
half for switch_exit()'s stack.  A dump of proc0paddr shows that proc0
has used at most 0x960 bytes (which gets uncomfortably close to the proc0
pcb), and switch_exit() has used at most 0x140 bytes.  I can also see that
the 0x140 bytes used by switch_exit() is sufficient to overwrite the stack
frame in uvm_scheduler() when both are using the same stack.

  Ah - things become much clearer now:  exit2() calls wakeup(), which just
happens to be in kern_synch.c.  Apparently the call to _mcount() in
wakeup() is just enough to clobber uvm_scheduler()'s stack frame.  I'd
guess that if kern_synch.c is not profiled, then it doesn't corrupt the
stack frame used by uvm_scheduler().

  So it appears that switch_exit() needs to use a different stack than
proc0.  It doesn't appear to need a very big one, so allocating USPACE for
the stack is probably overkill.  Even allocating a single page might be
more space than actually required, although I'm not certain if more might
be used in some situations.  The wakeup() in exit2() will force the reaper
process to run, so switch_exit() will have a runnable process to switch to
after exit2() returns, so will not be running on the current stack very
long.  Also, since switch_exit() is called at splhigh, there shouldn't be
any worries about interrupts using more stack space.

Michael