Subject: Re: enlightenment on zs overruns
To: None <etmlyfl@etm.ericsson.se>
From: Gordon W. Ross <gwr@mc.com>
List: tech-kern
Date: 11/20/1997 15:50:05
> Date: Thu, 20 Nov 97 10:06:21 +0100
> From: Paul Kranenburg <pk@cs.few.eur.nl>
> 
> The problem was (and has always been) the hefty context loading operation
> in ctx_alloc() on sun4c MMUs (which was carefully placed outside the
> splpmap/splx scope, so changing just the definition splpmap() did not
> have any effect. splpmap can probably be omitted entirely from ctxalloc()
> anyway).

Hurrah!  Glad to see that fixed!

> Date: Thu, 20 Nov 1997 15:36:16 +0100
> From: Lyndon Fletcher <etmlyfl@etm.ericsson.se>
> 
[ context switch cost ]
> 
> I understood that the SPARC, having been designed from the very beginning
> to run a multitasking OS, actually has a fairly efficient context switch?
> Is the problem above a Sun or a NetBSD problem?

It is a NetBSD/sparc problem.  (Poor VM context switch design.)

> Shouldn't the context switch code be highly optimised
> because of the number of times it gets called?

Yes, it should.

SunOS and Solaris use a technique known as an "empty context"
for improving VM context switch times.  The essential idea is
that the context swtich code tries to be "lazy" about actually
allocating a new, per-process VM context.

The "empty context" is a special VM context that never has any
user-space mappings in it (only kernel mappings), so it can be
shared by all processes that have not recently faulted on any
user-space address.  The existence of the empty context allows
the context switch code to assume every process has a valid VM
context (though it may be the shared, empty one) so the only
VM context switch work done in cpu_switch is to load the MMU
root with the context currently assigned to the new process.

When a process resumes (switched-to) using the empty context,
it can continue in the kernel indefinitely without any need
for its own user context, until such time as it tries to touch
some user-space address.  At that point (and ONLY that point)
it should allocate its own private VM context, which usually
means stealing one from some other process.  One key point to
note here is that context allocation happens in the user-level
fault handler, which starts out at spl0(), and generally needs
to block interrupts only as is required by the pmap code.

I've recently changed the Sun3 pmap code to use this design
("empty context" and lazy context allocation) and the result
was a dramatic performance improvement.  The design works
particularly well on the Sun proprietary MMUs because it lets
the cpu_switch routine do all its VM context switch work with
nothing more than a control-space write!  This also works well
with traditional MMUs (page tables in RAM) and an example of
this design can be found in the sun3x pmap.

If someone did this for the sparc, I predict that it would
significantly improve performance there as well.  Takers?