Subject: memcmp() optimisation on i386
To: None <tech-toolchain@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: tech-toolchain
Date: 10/02/2002 21:49:34
I just noticed that gcc compiles memcmp into a 'repe cmpsb'
whenever the length isn't a constant.
The byte compare will be significantly slower than a 32bit
compare (especially is the pointers are aligned).

I don't profess to be a gcc wizard, but is it possible
to get something like:

	movl	%ecx,%edx
	shrl	$1,%ecx
	shrl	$1,%ecx
	repe	cmpsl
	jne	1f
	movl	%edx,%ecx
	andl	$3,%ecx
	repe	cmpsb
1:

used instead?  At least for checks for equality.

Alternatively an inlined function to do 'repe cmpsl'
could be used in the source when it is known that the
length to be compared is a multiple of 4.

	David

-- 
David Laight: david@l8s.co.uk