Subject: SSE optimized memset
To: None <port-amd64@netbsd.org>
From: Kimura Fuyuki <fuyuki@hadaly.org>
List: port-amd64
Date: 01/18/2007 12:49:37
Hi folks,
I'm now looking at the SSE instruction set and thinking way to use it to boost
some string functions, but seems quite difficult at least for generic use...
Anyway, here is my first try; an SSE optimized memset.
http://www.hadaly.org/fuyuki/memset.S.patch
The patch above adds a second booster to the current (well tuned) memset
implementation. It avoids cache pollution by adding "non-temporal" hints to
MOV operations. With normal memset, just megs of calloc() totally trashes
cache contents. Too harsh for such a limited resource.
Here's also a regression test. (actually it can solely be put in the regress/
tree.)
http://www.hadaly.org/fuyuki/memset.tar.bz2
To tell the truth, I don't know the exact calling convention in NetBSD/amd64.
Is that same as in Linux? If kernel can freely break xmm registers, #ifdef
could be removed.
Any reports or suggestions are appreciated. Does the patch work on
MP-machines? (I'm testing it on a cheap Celeron...) Boost the things? (I
feel some!) Ideas for tunes?