Subject: Re: memcmp() optimisation on i386
To: David Laight <david@l8s.co.uk>
From: Bang Jun-Young <junyoung@mogua.com>
List: tech-toolchain
Date: 10/06/2002 18:17:14
On Wed, Oct 02, 2002 at 09:49:34PM +0100, David Laight wrote:
> I just noticed that gcc compiles memcmp into a 'repe cmpsb'
> whenever the length isn't a constant.

I noticed that it does the same even when the length is a
constant.

> The byte compare will be significantly slower than a 32bit
> compare (especially is the pointers are aligned).
> 
> I don't profess to be a gcc wizard, but is it possible
> to get something like:
> 
> 	movl	%ecx,%edx

Is

> 	shrl	$1,%ecx
> 	shrl	$1,%ecx

faster than

	shrl	$2,%ecx

?

> Alternatively an inlined function to do 'repe cmpsl'
> could be used in the source when it is known that the
> length to be compared is a multiple of 4.

Since it returns difference between two differing bytes when two
strings are not the same, some more work would be needed if repe cmpsl
was used. And it would make 4-byte comparison less optimal. 

Jun-Young

-- 
Bang Jun-Young <junyoung@mogua.com>