I've committed improved memcpy and memset routines to the i386 libkern source tree. These are 'rep xxx' based routines but are significantly faster than the old routines for short transfers, and slightly faster for long transfers [1]. i'm wondering if the benchmarks you used could be commited to src/regress? .mrg.