Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [PATCH] i386 copy.S routine optimization?

On Mon, Jun 10, 2013 at 10:20:25PM +0200, Yann Sionneau wrote:
> Hello,
> I already talked about this with Radoslaw Kujawa on IRC, I understood 
> that it is far from trivial to say if it is good to apply the following 
> patch [0] or not due to x86 cache and pipeline subtleties.

Please inline patches in the mail, that way they are definitely in the
mail archive. It also makes them much easier to review.
Alse ensure you quote the cvs revision of the main file - otherwise
the line numbers won't match.

If you want to make a measurable improvement to copystr() don't use
lodsb or stosb and use 32bit reads from usespace.
I can't remember whether it is best to do misaligned reads and aligned writes
(or aligned reads and misaligned writes), in any case if you do aligned reads
you don't have to worry about faulting at the end of a page.

Look at the strlen() code for quick ways of testing for a zero byte.
For amd64 the bit masking methods are definitely faster.

Probably the worst part of the current code is that the 'jz' to skip
the unwanted write will be mis-predicted.


David Laight: david%l8s.co.uk@localhost

Home | Main Index | Thread Index | Old Index