port-arm32: Re: cpu_switch (was Re: 1.5 Release documentation ...)

Subject: Re: cpu_switch (was Re: 1.5 Release documentation ...)
To: Chris Gilbert <chris@buzzbee.freeserve.co.uk>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm32
Date: 11/07/2000 10:56:21

> Where would you recommend working to get better performance?  At one time the 
> pmap_* stuff was a mess, and that was expected to be improved.  And I note 
> that you did some work on it, so is it safe to assume that pmap is not a 
> problem any more?

IMHO It's still a mess and needs a rewrite from the ground up.  I've done 
some hacks at home and managed to remove approximately 90% of the cache 
flush calls from some routines; but we are still flushing the cache far 
too often and the impact of the changes I've made is not as significant as 
one might expect from the headline figure.

I haven't had chance to look at the code for a couple of months, but IIRC 
there are still two primary sources of cache flush: context switch 
(unavoidable) and process death.  The latter is a pain, because with the 
current UVM code we have to flush the cache on each call to pmap_remove 
and this is called once per UVM object (remember each process will contain 
at least 3 objects -- code, data and stack), so we have to flush the cache 
3 times (instead of once) each time a process dies.  I suspect that to fix 
this, however, we will need to make some changes to the generic UVM code.  
Sadly, from looking at the UVM code, it doesn't look as though 
pmap_update() is likely to work any more (I could only find one call to it 
in the entire UVM source), so it isn't currently possible to build a 
task-list of things that need to be done.

> 
> Are there other areas that need work?  (I know that an RPC will never fly 
> because of the memory speed)  I note from the TODO that things like irq 
> delivery need optimising, there's other things in there as well that may be 
> worth looking at.

Well, I noticed last night that we aren't running a separate statclock for 
gathering statistics -- I've no idea why not, but if there is no technical 
reason for not doing so, adding this shouldn't be hard. (See iomd_clock.c)

> 
> Is the best way to find out to get a profiling kernel, and put some load onto 
> it? (first twisting the machine/asm.h ENTRY macro to do profiling info) 

Yes, and it should be possible to do this now that the profiling stub can 
be called form assembler.

R.