Subject: Re: Xscale optimisations
To: Steve Woodford <scw@wasabisystems.com>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 10/14/2003 12:28:13
> 	- significant improvements to some mem*() library functions,

Are those a real improvement?
In particular when the code isn't in the I$ ?

Other experiments have shown that they are very often called
with short transfer lengths, and that the cost of deciding which
algorithm to use can become dominant.

Also, IIRC, the strongarm doesn't execute stmgeia quickly if the condition
is false.  Having 16 in a row must be worth a branch?

> 	- caching the kernel stack and pcb in xscale's mini data cache.
> 	  Even though this cache is used for copy/zero page (which effectively
> 	  clear the mini D$ each time they're called), this is still a win as
> 	  copy/zero page ops are not all that common, relatively speaking.

That ought to benefit SA1100/1110 (110?) systems as well.

Does anyone know if the SA1100 ever generates a memory burst for a stmia 
that write that misses the cache?

	David

-- 
David Laight: david@l8s.co.uk