Subject: Re: lib/35535: memcpy() is very slow if not aligned
To: None <port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: David Laight <david@l8s.co.uk>
List: netbsd-bugs
Date: 02/03/2007 21:25:01
The following reply was made to PR port-amd64/35535; it has been noted by GNATS.

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/35535: memcpy() is very slow if not aligned
Date: Sat, 3 Feb 2007 21:23:31 +0000

 On Sat, Feb 03, 2007 at 02:25:02PM +0000, Kimura Fuyuki wrote:
 >  
 >  The real (what's real?) latency for rep instructions can be seen here  (8.3): 
 >  http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
 
 Hmmm... I'm not entirely certain some of the suggestions in that document are correct!
 Some of the C code certainly isn't!
 Page 17 suggests the use of:
 	#define FLOAT2INTCAST(f)  (*((int *)(&f)))
 for speeding up float comparisons agains constants.
 Someone hasn't read up on the C aliasing rules.
 
 Page 106 also suggests you need to be a lot more careful with your write-combining
 code.  Thinking further it probably can't be used without disabling interrupts (or
 maybe making the write to each cache line a RAS sequence).
 (But maybe I'm misunderstanding exactly what happens to the partially written line.)
 eg stuff in appendix B :-)
 
 Page 167 suggests never (ok hardly ever) using the rep string opcodes.
 The algorithm on pages 181+ looks like a good way to kill the I-cache.
 
 Oh, and for good measure, code has to run on intel cpus as well.
 
 	David
 
 -- 
 David Laight: david@l8s.co.uk