port-arm: Re: Kernel copyin/out optimizations for ARM...

Subject: Re: Kernel copyin/out optimizations for ARM...
To: <>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 03/15/2002 16:07:25

On Fri, Mar 15, 2002 at 10:03:17AM +0000, Richard Earnshaw wrote:
> > Looks pretty good, though I haven't tried it.
> 
> However, I wouldn't recommend the use of swp except when a locked transfer 
> is really needed --  it can have nasty cache implications.

Since the weather here is dreaery and damp....

I've done some local optimisations:
- Removed the swp
- filled many of the delay slots
- removed the 16byte align code from kcopy

All 3 routines seem to work as copy routines, but my ARM system
doesn't run netBSD so I can't test the fault handling.

I have a slight doubt over the copyout code:
	ldmia   r0!, {r4, r5, r6, r14}
	strt    r4, [r1], #4                    /* need user perms here... */
	stmia   r1!, {r5, r6, r14}              /* ... kernel ones ok here */
Now r1 is 16 byte aligned so that the strt and stmia are
guaranteed to be in the same page - so it is basiclaay sound.

However if the page gets set 'copy on write' between the strt and
stmia then memory will get corrupted.
For this to happen I think you need (at least) kernel threads (so
a different thread can call exec) and either in kernel preemption
or a multi-cpu system (to get anything else running at all).

Jason - is this a real possibility anytime in the next 10 years?

The alternative is to replace the stmia with 3 strt instructions.
This is slightly slower, but since the alignment code can be removed
will be shorter for small transfers.  I've coded both versions,
defining DONT_USE_LDM_USER will cause the strt (and ldrt in copyin)
instructions to be used.
Might be worth running a benchmark test (of something that does
moderate length copyin/out) to see how much effect it has.

Anyway new file on www.l8s.co.uk then netbsd/bcopyinout.S

	David

-- 
David Laight: david@l8s.co.uk