Subject: Re: Adding a size parameter to stackgap_init()
To: David Laight <david@l8s.co.uk>
From: Wolfgang Solfrank <ws@tools.de>
List: tech-kern
Date: 03/18/2002 14:50:26
Hi,

> An implemetation might have passed the system call parameters
> in some foul, horrid, way.

Hmm, I don't think that you can call this "some foul, horrid, way".
See below.

> But I would have though there
> was an extensible stack there somewhere - even if it only actally
> contains the env and argv structures.

Well, the kernel doesn't really rely on the existence of a stack
except in some very limited way (the following is based mostly on
CISC architectures; it's handled differently on at least some of the
RISC architectures):

1. It has the concept of the stack limit which is rather vague though.
   You can think of it as a preallocated memory area that is setup
   on program start.  Although, at least on some architectures, the
   kernel doesn't allow page faults below the current value of the sp
   in this area.  However, the kernel doesn't require the sp to point
   into this area at all.

2. In order to do a syscall, you push the parameters on the stack and
   do the syscall instruction (well, there may be an additional return
   address on the stack, but this isn't used by the kernel at all).
   The only thing that is relevant to the kernel here, is that the sp
   points to some memory area where the kernel can get the syscall
   parameters.

3. On signal delivery, the kernel does push a signal context onto the
   stack and call the signal handler.  However, in order to support things
   like threaded code (see below), you can instruct the kernel to do
   signal delivery on a separate stack.  I.e., most of the time, the
   user code can run without a stack pointed to by sp.

> > There are even some architectures (which we support) that don't have
> > a real sp, but (for programs written in a language that depends on
> > a stack) simulate one by using an ordinary register.
> 
> Erm ARM? pdp-11? :-)

Well, I thought more of all the RISC chips, i.e., SPARC, PowerPC etc.
And yes, ARM, too.  I.e., those machines that don't have a stack concept
defined by the hardware, but only by software convention.

The pdp-11 does define a stack in hardware, where the return address on
a function call is pushed on the stack.

> I would have though that netbsd relied on a stack based system.
> I can't see you getting any unix apps running without one.
> After all the kernel needs one itself.....

The kernel is written in a language which gets compiled in a way that
relies on a stack.  However, this doesn't preclude running any other
code in userland.

> My guess is that the memory layout is independant of the object file
> type!  It is much more of an OS dependant (constrained by system hw)
> issue.

Well, the ELF ABI for some architecture does define quite a few things
about the memory layout (just as did the a.out ABI).  For every architecture
it defines the conventions used to do subroutine calls (like, e.g., for those
RISC chips, it defines what register is used as the sp), the way you
implement shared libraries, etc...

> > It would have seriously hindered the devlopment of things like (p)threads,
> 
> > alloca
> 
> A horrid hack which was originally something like:
>         pop     ret_addr
>         pop     size
>         mov     sp, accumulater         /* function return value */
>         sub     size, sp
>         jmp     ret_addr

Yes, I'm totally aware how alloca was implemented (starting U*X work on
a PDP-11/70 with PWB and V7 :-)).

> > unexec, threaded code etc. (probably even shared libraries, would you
> > have restricted the layout before their advent)
> > in the past and may do so for some future ideas.

Just as an example of the things I have in mind, I try to give a short
description of how threaded code works (at least on CISC architectures):

The words of the language are implemented with some small code pieces.
All those pieces end with a return instruction.  The sp points into a list
of addresses of the words you want to execute.  After executing one word,
the return instruction at the end of the word's code just pops the address
of the next word's code and jumps to it.

In effect, you use the stack pointer as the program counter of the
interpreted code.

> FWIW another alternative would be to request a page of user memory
> from the system - must be something in the pmap code for that.

What you probably have in mind here is mmaping some anonymous page somewhere
in the user address space for the duration of the system call and using
that instead of a fixed memory area (hopefully not by some special pmap
code, which would have to be implemented for every architecture separately,
but through standard UVM procedures).  Yes, this sounds like a proper
solution to the problem of requiring some arbitrary amount of user memory
for the things that started this thread.

However, I'd say that implementing a system call emulation by munging the
parameters into a different structure, copying them out to user memory just
to copy them in again by the native system call handler, and probably doing
the same thing for the opposite direction is really a "horrid hack".

The current solution (if I understand it correctly, not having followed it
in close detail) of using a separate function to do the work and having
just different helper functions to interface with the user program for
native and emulation mode, is much cleaner.

Ciao,
Wolfgang
-- 
ws@TooLs.DE     Wolfgang Solfrank, TooLs GmbH 	+49-228-985800