port-i386: Re: per-cpu TSS

Subject: Re: per-cpu TSS
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Andrew Doran <ad@netbsd.org>
List: port-i386
Date: 12/11/2007 18:50:00

On Tue, Dec 11, 2007 at 08:52:26PM +0900, YAMAMOTO Takashi wrote:

> > attached diff is to use per-cpu tss rather than per-process tss.
> > number of processes are no longer limited by number of gdt slots with this.
> > i tried hbench's lat_ctx benchmark and didn't notice any performance
> > differences.  (although it might be differ for processes with i/o bitmaps.)
> > any comments?
> > 
> > todo: kvm86
> > 
> > YAMAMOTO Takashi
> 
> systems are getting larger and the limit is getting relatively smaller.
> 
> is there anyone who still believes switch-by-tss is faster?

I think it is safe to assume that switch by tss gate is going to be slower
than using simple instructions to achieve the same goal - especially since
we have to save a lot of state just to get as far as cpu_switchto().

> can anyone explain how switch-by-tss helps mach-like continuation?
> (i don't think it helps.)

I don't see how it helps, either.

> does anyone want to implement "gdt entry swapping" which mycroft suggested?
> although i'm not really sure if i understand what he meant correctly,
> it sounds like too much complexity for little gain.

I read it as having one LDT descriptor in each per-CPU GDT, which is then
updated in cpu_switchto() to point to a new area. We already update GDT
descriptors for the user-set fsbase and gsbase.

I don't see much reason to have a per-process LDT or to use descriptors in
the LDT (e.g. LUDATA_SEL) unless there are some application that needs them?
Perhaps for emulation but I don't believe that they are needed for native
programs.

The noexec stack stuff could use the same trick as fsbase or gsbase to
adjust the limit on GUCODE_SEL (and we could probably remove the high-limit
selector).

Andrew