Port-powerpc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 16KB page ibm4xx performance



On Wed, 18 Aug 2010, Masao Uebayashi wrote:

> On Mon, Aug 16, 2010 at 05:23:56PM +0900, Masao Uebayashi wrote:
> > I'm testing XIP on OpenBlockS266 (405GPr).  It works, but it seems very
> > slow.  If I disable neighbor fault for MADV_NORMAL, /bin/ksh as XIP
> > starts up in doubled speed, and time -l shows 1/2 page reclaims.
> > 
> > I guess this platform needs serious TLB / cache tuning...
> 
> There are a mixture of problems:
> 
> - Our PowerPC ELF has RWX .data/.got/.plt.  If programs' .data sections
>   are not aligned to 16KB, those are mapped as "overlay"; pages are always
>   copied.
> 
> - Mapping executable pages by pmap_enter() is very expensive because of the
>   __syncicache() operation.
> 
> - Neighbor fault tends to cause TLB shortage.  This is bad especially if exec
>   mappings became victimes.
> 
> I can get useful speed on 405GPr now by doing the followings:
> 
> - Use static binaries.
> 
> - Disable neighbor faults.
> 
> *
> 
> I'd recommend all 405GPr users to use only static userland, so you
> get 2x speed...

Actually, let me make a couple of suggestions.

1) Instead of static linking, try changing the ELF page size to somehing 
large than 64KB that way you don't have issues with the different sections 
crossing page boundaries.

2) The TLB supports multiple page sizes.  Add support for multiple page 
sizes to the OS.  (Probably a largish project.)

3) The TLB page replacement algorithm is all done in software.  Tweak it.

4) Optimize pmap so that expensive operations like __syncicache() are only 
done in pmap_update() and only if needed.

I never really spent much time optimizing the 4xx pmap.  I was much more 
concerned about the copyin/copyout code that still is a lot slower than I 
would like.

Eduardo


Home | Main Index | Thread Index | Old Index