port-arm: Re: Kernel copyin/out optimizations for ARM...

Subject: Re: Kernel copyin/out optimizations for ARM...
To: None <port-arm@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 03/18/2002 11:55:53

On Sat, Mar 16, 2002 at 04:47:19PM -0800, Jason R Thorpe wrote:
> 
> Well, it doesn't work :-)
> 
> First of all, I had to futz with the numbered-labels.  Our ELF assembler
> does like e.g. 11$.  I'm pretty sure I got the translation to gas-style
> numbered-labels right :-)

Interesting, the copy of gas I was using managed all right
- it is a rather old copy....
OTOH gas does usually not want the '$' , and wants a 'b' or 'f'
after the number in the jump.
(The 'b' and 'f' form allow you to reference past name labels
without having the 'ENABLE LSB' (local symbol block) of the MACRO-11
assembler).
> 
> Secondly, after managing to fork a few processes on an SA-110, it
> croaks:
> 
> panic: kernel diagnostic assertion "umap->refcount != 0" failed: file "/u1/netbsd/src/sys/arch/shark/compile/SHAG-SHARK/../../../../uvm/uvm_bio.c", line 253
> 
> Looks like it's not honoring write-protection properly somewhere.

Probably confirmed by the 'USER_PERMS_ALWAYS' version working.
OTOH I might have got one of the boundary conditions wrong, and
be missing the last word, or copying an extra on somewhere.

I've also had some further thoughts about this sort of optimisation.
The ARM processors have relatively small instruction caches, so
it is fairly unlikely that code that isn't being called with a
loop will actually be in the cache.  Additionaly the cost of
reading an instruction from memory is several times the cost
of executing it.

In particular this means that converting:
	while (condition)
		do statement;
into:
	if (condition)
		do
			statement
		while (condition)
(in order to save an unconditional branch) is probably almost never
worth while on a modern cpu!

Do you know (off hand) how big the copyin that is done for the
arguments on every system call is?
I optimised for single words and moderate length buffers.
Perhaps optimising for short buffers (say up to 32 bytes) and
long transfere would be better.

	David

-- 
David Laight: david@l8s.co.uk