current-users: Re: random signals kill my processes with -current

Subject: Re: random signals kill my processes with -current
To: None <perry@piermont.com>
From: Chris G. Demetriou <cgd@CS.cmu.edu>
List: current-users
Date: 01/27/1997 15:54:15
> > It turns out that process 0 overflows its stack during auto-configuration.
> > The layout in locore allows for this to happen, but meanwhile a whole
> > bunch of values in proc0's user structure are now filled with randomness,
> > including, e.g., p_timer[IMTER_VIRTUAL], which is then propagated to all
> > processes.
> > 
> > I think it's sufficient to re-init proc0's user structure to zero in
> > machdep's cpu_startup() after all auto-configuration is done.
> 
> Wouldn't it be superior to detect and prevent the stack overflow? This
> could happen on other architectures at some point, and the overflow
> might go past the user structure...

So, that's not too easy (impossible to do 'sanely' on some
architectures) and rather wasteful (at least the way things currently
are done).

Currently, the user area and kernel stack are set up like:

	p_addr				     ((char *)p_addr)+USPACE
	|				     |
	|------------------------------------|
	|        :                           |
	| U AREA :     KERNEL STACK AREA     |
	|        :                           |
        |------------------------------------|

On architectures where the stack grows down (do we currently support
any where the stack grows up?  I don't think so!), the stack starts at
((char *)p->p_addr) + USPACE, and grows down towards the U area.


To make kernel stack overflows detectable in this situation, you'd
need to add an (unmapped) guard page at the end of the kernel stack,
putting the U area into its own page (or into malloc()d memory, but
that's bad because swapping it out then becomes a fair bit harder,
though by no means impossible).  This has the consequence of wasting
most of a page of memory (the page containing the U area), unless you
go the malloc() route.

It also means that, on most/many architectures, kernel stack overflows
can't be handled reasonably.  Since as of the time of the overflow,
there's no space left to push data on to the stack, on most
architectures there's no place to put the stack frame that indicates
that the stack grew into the guard page...  Some architectures will
deal with this sanely (e.g. on the alpha, i get sent back to the
firmware with a KSP INVALID fault 8-), but many e.g. m68k will appear
to spontaneously reboot.



I don't like trashing other stuff in memory, but neither do i like
spontaneous reboots.

doing the guard page thing would require much machine-dependent code
to be modified, but would have relatively little effect on the
machine-independent kernel code.  (Two things that I can see right off
would need to be changed: the swapping code, if the user area was
malloc()'d, which is probably reasonable if this change is made, and
the core dump code.)


chris