Subject: Re: assembly optimisations
To: None <murray@river-styx.org>
From: David Laight <david@l8s.co.uk>
List: port-amd64
Date: 09/19/2007 22:53:54
On Wed, Sep 19, 2007 at 03:30:52PM +1000, murray@river-styx.org wrote:
>
>    I have been playing around a lot with various assembly optimisations for
> low level functions like memcpy, memset, strcmp etc.
>
>    I have concluded from lots of profiling that the implementations as
> found in opensolaris are the best by far. They are generally written by
> AMD themselves and I guess have some work by opensolaris related folk.

I've not looked at these routines, but when benchmarking you have to
be extrememely careful if you want to get figures that are in any way
comparable to what happens in real life.

Not only are there issues with the buffers being in the data cache,
but in real life the code won't always be resident,
and the branch prediction tables won't be 'primed',
not to mention the effect of displacing other code out of the i-cache
for unrolled / large footprint algorithms.

	David

-- 
David Laight: david@l8s.co.uk