Subject: Re: Xscale optimisations
To: David Laight <firstname.lastname@example.org>
From: Steve Woodford <email@example.com>
Date: 10/14/2003 13:47:20
On Tuesday 14 October 2003 12:28 pm, David Laight wrote:
> > - significant improvements to some mem*() library functions,
> Are those a real improvement?
> In particular when the code isn't in the I$ ?
I've benchmarked various combinations of micro-optimisations on the
Xscale, and what you see in the current code is what gave the best
> Other experiments have shown that they are very often called
> with short transfer lengths, and that the cost of deciding which
> algorithm to use can become dominant.
Yup. The short/misaligned memcpy code is borderline, but in network
throughput tests, it gives a slight improvement.
> Also, IIRC, the strongarm doesn't execute stmgeia quickly if the
> condition is false. Having 16 in a row must be worth a branch?
My brief was to optimise for Xscale. If I've added non-optimal code for
non-Xscale cpus, then that's probably due to me not being as careful
with that part of the code. Volunteers to fix it are more than welcome
> [using mini D$] ought to benefit SA1100/1110 (110?) systems as well.
> Does anyone know if the SA1100 ever generates a memory burst for a
> stmia that write that misses the cache?
See above re. concentrating on Xscale. ;-)
Wasabi Systems Inc. - The NetBSD Company - http://www.wasabisystems.com/