port-alpha: Re: Lazy FP context switch reconsidered

Subject: Re: Lazy FP context switch reconsidered
To: None <thorpej@zembu.com, port-alpha@netbsd.org>
From: Lets Go Canes <letsgonhlcanes@yahoo.com>
List: port-alpha
Date: 07/16/2001 12:32:39
Hi all.

I certainly like the simplicity of the new approach - it treats the FP
context the same as the rest of the context, but with enough smarts to
only do what is needed.

Having said that, the only other thing that comes to mind would be a
scheme where once a process has used the FPU on a given processor, the
process would develop an affinity for that processor for as long as the
FPU state remains unchanged.  But if it is likely that other processes
are going to use the FPU, then it probably isn't worth it.

If a FPU has the context of a given process, and the process is being
switched-in, is there anything that drives it to the processor with its
FP context *if* that processor is idle?

--- Jason R Thorpe <thorpej@zembu.com> wrote:
> So, I want to stabilize the Alpha MP kernel a bit, and as part of
> this,
> I want to reduce some of the complexity in the code that deals with
> multiple processors.
> 
> One of the most complex things here is the MP-safe lazy FPU context
> switch code.
> 
> The way we currently do lazy FP context switch is like so:
> 
> 	(1) Each processor has two variables: curproc, and fpcurproc.
> 	    curproc is the process currently running on the processor,
> 	    and fpcurproc is the process who's FP state the processor
> 	    currently holds.
> 
> 	(2) When a process returns to userspace, if it is fpcurproc,
> 	    then the FPU is enabled.  Otherwise it is disabled.
> 
> 	(3) When a process uses the FPU, and the FPU is disabled,
> 	    an FEN trap is taken.  If the process has not yet used
> 	    the FPU, then it is marked has having done so.  IF it
> 	    has used the FPU, the kernel determines which processor
> 	    on which the FP state resides.  If not `self', then a
> 	    "discard FPU" interrupt is sent to the processor that
> 	    has it, and `self' waits until the other processor has
> 	    sync'd it back to the process's PCB.  Once the state is
> 	    in the PCB, it is loaded into `self's FPU.  fpcurproc
> 	    is then set to curproc.  Then see step 2.
> 
> Now, the code that does this is kind of complicated, and tricky to
> get right.  I also suspect that there is a lot of unnecessary
> overhead
> here.
> 
> This is exacerbated by the fact that GCC likes to emit FP insns for
> inline block moves.  Thus, the number of processes that use FP is
> inflated somewhat.
> 
> I decided to instrument this.  I added some code to the FEN trap
> path that collects two different statistics:
> 
> 	* "FP proc use" -- when a process uses FP for the first
> 	  time, this counter is incremented.
> 
> 	* "FP proc re-use" -- when a process that has previously
> 	  used FP takes a FEN trap to be able to use it again,
> 	  this counter is incremented.
> 
> I then booted the kernel and immediately built a GENERIC kernel.  I
> wanted
> the number of context switches to be high, so I used "make -j4".
> 
> When the compile finished, I read some counters.  Here are the
> interesting
> numbers:
> 
> 	93470 cpu context switches (from vmstat -s)
> 
> 	event                               total     rate type
> 	FP proc use                          7728        5 misc
> 	FP proc re-use                      44371       34 misc
> 	soft serial                          1833        1 intr
> 	soft net                             2544        1 intr
> 	soft clock                           4068        3 intr
> 	cpu0 clock                        1562602     1209 intr
> 	cpu0 device                         47304       36 intr
> 	kn300 irq 12                         4863        3 intr
> 	kn300 irq 16                           68        0 intr
> 	kn300 irq 36                           26        0 intr
> 	kn300 irq 40                        40514       31 intr
> 	isa irq 4                            1833        1 intr
> 
> So, this is how I have interpreted the numbers:
> 
> 	(1) Nearly 1/2 of all context switches resulted in the
> 	    process using FP again, and having to take a trap
> 	    in order to do so.
> 
> 	(2) The rate at which these traps happened is nearly
> 	    as high as interrupts from devices, and is higher
> 	    than the interrupt rate from the SCSI controller
> 	    to which the disks in the RAID volume holding the
> 	    source tree are attached.
> 
> 	(3) Since the number of processes that use FP for the
> 	    first time a fair bit smaller than the re-use
> 	    count, it suggests that processes that use FP once
> 	    are very likely to use it again.
> 
> What this suggests to me is that lazy FP context switching might not
> be such a hot idea on the Alpha port.  What I'd like to do is change
> the FP context swithing algorithm to something like this:
> 
> 	(1) When a process returns to userspace, if the process
> 	    has used FP, enable the FPU.
> 
> 	(2) When a process is switched away from, if it has used the
> 	    FPU, save the FP state (thus releasing the FPU for someone
> 	    else to use).
> 
> 	(3) When a process is switched to, if it has used the FPU,
> 	    restore the FP state.
> 
> 	(4) When a processes uses FP for the first time, simply mark it
> 	    has having used the FPU, and `restore' the FP state from
> 	    the PCB (the FP state is zero'd when a process exec's).
> 
> This method is a whole lot simpler, eliminates the need to deal with
> other processors, and may in fact reduce the amount of overhead
> involved
> for processes that do in fact use FP.
> 
> Thoughts/comments?
> 
> -- 
>         -- Jason R. Thorpe <thorpej@zembu.com>


=====
--------------
Lets Go Canes!

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/