Subject: Re: Performance of various memcpy()'s
To: Bang Jun-Young <junyoung@mogua.com>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: port-i386
Date: 10/22/2002 10:59:51
On Wed, Oct 23, 2002 at 02:08:24AM +0900, Bang Jun-Young wrote:

 > Here is a new version of i686_copyin(). By saving FPU state in stack,
 > I could make it work with programs that use FP operations, including
 > XFree86, xmms, mozilla, etc.

Cool.

 > In this version, I set the minimum length to use MMX bcopy to 512.
 > Since I don't know of a kernel profiling tool or a method to measure
 > copyin performance at kernel level, the number may be too small, or
 > too large.
 > 
 > Possible todo:
 >  - i686_copyout(), i686_kcopy(), i686_memcpy(), ...
 >  - use prefetch and movntq instructions for PIII/4 or Athlon.
 >  - use npxproc to eliminate overhead in saving FPU state as 
 >    FreeBSD does.

A few things:

	* i686_copyout() is actually pretty important, because e.g.
	  we don't have zero-copy socket reads yet (only writes), so
	  a fast copy routine is important there.

	* Same for i686_kcopy() - it's used in the NFS path, at least,
	  and could significantly improve performance there.

	* i686_memcpy() - be careful, because you have the whole
	  "memcpy() is allowed in interrupts" thing.  It's probably
	  not worth bothering with this one, because there's a
	  potential to spend a LOT of time saving/restoring FPU
	  context.

	* Yes, only save/restore the FP state if npxproc != NULL.
	  In the MULTIPROCESSOR case, you also need to be careful
	  because you could get an IPI from another CPU requesting
	  the FP state, so you'll need to make sure to provide the
	  correct one!

	  In fact, it's probably best to save to the npxproc's PCB,
	  and restore it back from there, rather than the stack.
	  (Cuts down on potentially large stack usage, too.)

	* You have to handle the fxsave/fxrstor case, i.e. if the CPU
	  has SSE/SSE2.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>