port-mips: Lazy FPU context switch

Subject: Lazy FPU context switch
To: None <port-mips@netbsd.org>
From: Toru Nishimura <nisimura@itc.aist-nara.ac.jp>
List: port-mips
Date: 04/17/2000 10:48:30
Here goes the explanation of how lazy FPU context switch works.

Tohru Nishimura

--

FPU is troublesome in saving and loading entire hardware context
to/from reserved memory area hold by each process (it resides in
u_pcb in NetBSD/mips).   It's unrealistic to switch whole FPU
context from process to process upon every cpu_switch() is taken
because FPU context save and reload operation consumes significant
CPU cycles.

Modern CPUs provide the way to disable itself to execute any FP
insns.  When CPU is about to execute FP insn, an exception is posted
and operating system starts processing 'FPU was unavailable for
me' circumstance for the executing process.  It checks and prepares
to allow the process to use FPU, then restarts the process to
execute the FP insn once posted the exception.  This time FP insns
will be executed normally and never make 'FPU is unavailable'
condition until another process snatchs FPU from this process later.

Every process is created to have no FPU ownership and inhibited to
use FPU.   Unless the process ever executes any FP insns, nothing
special happens to it and the process terminates peacefully.

If a process inhibited to use FPU is about to execute a FP insn,
CPU posts 'unavailable' exception.  Global variable fpcurproc points
which process has the ownership of FPU.  At the moment upon
'unvailable' exception, FPU hardware should contain the snapshot
of 'runtime image' for the owner process, which is different from
curproc that has posted the exception.  Unavailable handler saves
the FPU hardware context made with a large number of FP registers
for the sake of fpcurproc, and load the curproc's FPU hardware
context into FPU registers.  The initial load of process FPU context
clears entire FPU.  In this way, FPU context switch is deferred
until another process is found requesting to use FPU.  Because the
vast majority of programs run no FP insn during process life time,
deferred lazy FPU context switch works handsomely avoiding rather
burdensome tasks of FPU save/load operations.

The burdensome FPU context switch syndrome is similar to one in
which MMU faces on process context switch.   MMU is a rather
complicated machinary may hold a complex internal 'state' to describe
the process' address space, or more weirdly, 'task description'
for runtime environment, nature and feature of processes defined
by CPU hardware foundation.  Some of MMU have dedicated register(s)
to point the memory region which describes process address space.
In that case MMU context switch is done handsomely by reloading
another value for the new process to the dedicated register by
executing a special MMU insn.  A certain CPU design is widely known
to run hilariously spectacular job for MMU context switch by
saving/loading handful numbers of register, traversing memory region
to establish new process runtime context, with the cost of
astonishingly long CPU cycle.   The costy hardware supported context
switch capability is seldom used in practice and many consider it
as CISCy or the waste of sillicon.

--