Subject: Re: Stackghost in OpenBSD: buffer overflow protection
To: None <avalon@cairo.anu.edu.au, clay@daemons.net>
From: None <eeh@netbsd.org>
List: port-sparc
Date: 09/22/2001 18:15:54
| I am still unsure of how expensive the register window
| overflow/underflow processing is.  The paper implies that each process
| gets its own clean set of register windows when started, so only
| programs which have function nesting deeper than X functions need to be
| examined by the kernel (where X is the number of register windows on the
| CPU, typically 8 or 16).  I was under the impression that the kernel and
| C run-time startup code would use most of the clean register windows,
| meaning that almost all function calls would cause register window
| overflows which would need to be examined by the kernel.  That is the
| question I was asking at the conference, which Casper was nice enough to
| answer for me.

Consider register windows to be a cache for the processor's stack, which 
they are.  So, when a process makes more calls than there are windows, 
they need to be spilled.  System calls can be considered function calls, 
so they also spill register windows.  Interrupts also generate function 
calls, which add register window pressure.  And whenever a process switch 
occurs, all the register windows for the old process must be flushed to 
the stack and the ones from the new process must be loaded.

| There may be some more efficient ways to prevent stack buffer overflows
| on the SPARC, playing with the MMU, but I don't have any details ironed
| out yet.  SPARCv9 has some neat features, address space identifiers,
| etc, but I'm not sure how to use those yet.

Those things are unlikely to have much application to this problem.
(Although storing the return frame pointer and return pointer in little-
endian format would certainly confuse anyone tryint to use one of these
exploits, as well as any debugger.)

There is one thing I think that paper has not adressed.  While it presumes
that the return address pointer can be overwritten, and concerns itself
with protecting that, that is insufficient.  The stack/frame pointer must
also be protected, or the stack pointer can be redirected to an arbitrary
location where a complete stack frame has already been constructed with 
an arbitrary return address pointer.

Anyway, while twiddling with the value of the return address pointer is
relatively cheap, accessing the PCB for the proper value to twiddle it
with is not and can cause complications if it is not mapped inside the 
TLB.  There are also issues if the stack page that the registers are to
be saved onto is not mapped or even allocated.  And this breaks debuggers
and libc routines that manipulate the stack.

The more complicated scheme, utilizing a separate kernel stack for 
stack pointers and return addresses also has to deal with managing an
arbitrarily deep stack in kernel space that needs to be extended
dynamically, and must be mapped by the TLB during window fill and spill
traps, correlating a user stack frame with the kernel stack pointer
and return PC.

A similarly effective technique would be to always save all the register 
windows to the PCB on trap entry, as is done if the user stack is not
mapped, and restore them from the PCB on return.  This would mean that
the lowest 8 or 16 register windows would be immutable from userland, and
any attacks would need to insure a call depth greater than the number of
register windows to be effective.  This would be much easier to implement
but definitely have a performance impact.

But, all things considered, I'm not convinced it's worth the effort and
performance hit to protect just one architecture from what are effectively
bugs in privileged programs that should never have been there in the first
place.

Eduardo