Subject: Re: copyin/out
To: Allen Briggs <>
From: Chris Gilbert <>
List: port-arm
Date: 08/09/2002 10:16:52
----- Original Message -----
From: "Allen Briggs" <>
To: <>
Sent: Friday, August 09, 2002 4:41 AM
Subject: copyin/out

> Hi,
> I've been working on a new copyin/copyout/kcopy that's significantly
> better in some caching modes on the XScale and slightly better in
> others.
> My three main concerns are:
> 1) how does it work on other ARM architectures
> 2) is the code too large for the more limited
>    of the arm32 archs?
> 3) Are there large, unaligned data copies going
>    through the copyin/copyout path?
> Basically, I've ditched the pte scan and I'm using ldr[b]t and str[b]t
> to access user data.  I've also unrolled some loops and I've put in
> some code to prefetch with the 'pld' instruction on XScale (if we can
> define something like __ARM_v5EDSP or something, we could use that).
> This does allow us to garbage-collect cowfault(), too.
> (I've done some profiling with the new pmc(9) facilities)
> Similar changes can be made to fusu.S, I believe--perhaps with more of
> a gain there.
> So, what do more experienced ARM-heads have to say about the attached
> bcopyinout.S ?

Quick look over it, do you need to preload the addresses you're storing to?
or does that cause it to fetch the tlb entries for speed?  IE aren't you
just filling the cache with stuff you're about to overwrite?

> With this, I'm seeing copyout run at about 63MB/s on a simple test
> (dd if=/dev/zero of=/dev/null count=1024 bs=1024k).

Hmm, I see near enough that already on cats 1.6D.
1073741824 bytes transferred in 17.343 secs (61912115 bytes/sec)

I'll update my src (been away a couple of days) and drop in your new
copyinout see if there's a gain or not.  (optimization is always going to be
fun with so many different cache architectures 8)

Or rather I would update my src but seems to be having disk
space issues, see if it wants to play ball this evening.