Subject: Re: per-cpu TSS
To: David Laight <david@l8s.co.uk>
From: Charles M. Hannum <abuse@spamalicious.com>
List: port-i386
Date: 11/13/2003 18:49:28
On Thursday 13 November 2003 05:55 pm, David Laight wrote:
> > Uh, "sort of".  You only need a kernel stack *per process* because we
> > always store state on the state when we switch.  We can arrange (cf.
> > later versions of Mach) to not do so in many cases, and instead have a
> > kernel stack per CPU most of the time.  This would be especially
> > beneficial for threads, as would it remove 8k of per-LWP overhead.
>
> Surely you can only do that if you switch out a process running in
> userspace. Most processes are blocked in the bowels of the kernel and have
> far too much kernel state to keep anywhere else.

Incorrect.  See Nathan's reference.

> For threads and LWPs the m-n implementation (presumably) means that threads
> waiting on other threads don't have an LWP.

Which is why I said "per-LWP," not "per-thread."  However, in most interesting 
applications (e.g. threaded servers), a very large fraction of threads spend 
most of their time sleeping (e.g. waiting for I/O), and so have LWPs.

(Slightly out of order here...)

> In any case 1000 processes by 8k is 'only' 8MB.

You're look at it in the wrong axis.

Consider that you have a fixed amount of memory, call it X.  The rest of the 
LWP overhead is O(1k) (I haven't actually measured it, but that's an EWAG).  
Currently, then, the number of LWPs you can have is roughly X/9k.  If you 
eliminate the kernel stack most of the time, or entirely, it becomes X/1k -- 
a 9-fold improvement.

Can you honestly say that isn't worthwhile?