Subject: Re: Floating point in the kernel
To: None <tech-kern@netbsd.org>
From: Eduardo E. Horvath <eeh@one-o.com>
List: tech-kern
Date: 09/18/1998 14:15:16
On 18 Sep 1998, Chris G. Demetriou wrote:

> Jason Thorpe <thorpej@nas.nasa.gov> writes:
> > Kernel threads wouldn't solve this, either.  Kernel threads are just that:
> > kernel code, but with their own context.  It has the same problems as any
> > kernel code using FP.
> 
> I believe that the notion was that one day that might be 'fixed.'
> 
> But, because there are lots of them and they may switch often, you
> really want kernel thread switching ot be as lightweight as
> possible...  So i still think it's unlikely.

Actually, I would think kernel threads make the entire problem worse,
since state needs to be saved during kernel context switches that was not
saved before.  

I've been thinking over this issue since I would like to eventually use
block load/store instructions in the kernel bcopy().  This, of course, is
highly port-specific.  

Some machines use a delayed fp store algorithm.  When the CPU goes from 
kernel to user mode the floating point hardware is disabled.  The first
time the floating point hardware is accessed, a kernel trap is generated.
The trap handler checks the state of the fp registers.  If they are not
empty, then a pointer to the floating point state structure of the last
process to use floating point is maintained, and the trap handler dumps
the dirty fp regs there.  If they are empty, it allocates a structure to
hold the process' floating point state, enable the hardware, and then
return to user mode.  

If a particular kernel routine wanted to use the fp regs, it could check
if the fp registers are dirty and dump them to the fp state structure,
allocate a new fp state structure possibly on the stack, do its thing, and
then clear out the fp state structure and disable the floating point
registers.  This means that the fp state would need to be saved only when
you enter one of these kernel routines that makes use of floating point
registers and you have dirty floating point registers.

But you really need to evaluate whether saving the floating point
registers is less expensive than doing this some other way.  In my case
the break even point of adding 128 bytes (half the fp register bank) to
the bcopy() or bzero() comes at around 128 bytes, since the block move is
the only way to get cache-allocate.

If your hardware does not have dirty bits or fp-disable so you can delay
floating point saves then adding any fp to the kernel is almost certainly
a major lose.

=========================================================================
Eduardo Horvath				eeh@one-o.com
	"I need to find a pithy new quote." -- me