Subject: Re: memcmp() optimisation on i386
To: <>
From: David Laight <david@l8s.co.uk>
List: tech-toolchain
Date: 10/08/2002 10:03:28
> I noticed that it does the same even when the length is a
> constant.

If it is constant and a multple of 4 and equality is being tested
for then there is no excuse...
 
> Is
> 
> > 	shrl	$1,%ecx
> > 	shrl	$1,%ecx
> 
> faster than
> 
> 	shrl	$2,%ecx
> 
> ?

Probably yes!  Definitely on a P4.

> 
> > Alternatively an inlined function to do 'repe cmpsl'
> > could be used in the source when it is known that the
> > length to be compared is a multiple of 4.
> 
> Since it returns difference between two differing bytes when two
> strings are not the same, some more work would be needed if repe cmpsl
> was used.

True - subtract 4 from SI and DI, add 4 to CX and do the byte compare.
Except that the startup costs for cmps is so big you don't really
want to do it for a small number of bytes! [1]

> And it would make 4-byte comparison less optimal. 
But I guess memcmp isn't often used for 4 byte buffers!
Using cmpsl for a 4 byte compare is already sub-optimal!

	David

[1] Which is why an explicit name[1] != search[1] prior to the
inlined 'rep cmpsb' is such a big gain...

-- 
David Laight: david@l8s.co.uk