Subject: Re: copyin/out
To: <>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 08/09/2002 20:31:34
> 
> >  > Hmm, I see near enough that already on cats 1.6D.
> >  > 1073741824 bytes transferred in 17.343 secs (61912115 bytes/sec)
> 
> That's about the same gain I get on cats:
> 1073741824 bytes transferred in 11.443 secs (93833944 bytes/sec)
> >
> > Interesting.  The performance characteristics of the old code were
> > VERY different on a 400MHz i80321 (XScale core).  Indeed the old code
> > on my Shark can do:
> >
> > 1073741824 bytes transferred in 15.120 secs (71014670 bytes/sec)
> >
> > and the new code on the Shark yields:
> >
> > 1073741824 bytes transferred in 8.447 secs (127115167 bytes/sec)

Clearly a useful improvement...

However those were for very large transfers.
What happens for small transfers where all the 'red tape'
becomes more significant than the innermost loop.
Even 128 or 512 bytes might be significantly slower.

I expect that certain workloads contain a significant number
of short transfers.  It is probably possible to test misalgned
transfers using mismatched ibs and obs values.

	David

-- 
David Laight: david@l8s.co.uk