Subject: Support for ARM9E
To: None <>
From: Scott <>
List: port-arm
Date: 07/27/2006 15:08:02
Hi all,

The processor on my board is an ARM926EJ-S.  NetBSD didn't have support for 
it, so I've gone ahead and added code for it.  My understanding is that the 
ARM9E processors are ARMv5 like the ARM10 processors, and unlike the ARM9 
processors which are ARMv4.  Because of that, I've added CPU_ARM9E and am 
basically just treating it for the most part as CPU_ARM10.  This seems to be 
working just fine, but in doing this, I've ended up with a couple of curiosities.

1) I was having troubles if the cache buffering on my early MMU setup did not 
match the "real" MMU setup.  In summary, if my early translation table 
entries' Buffering bit was not the same as the "real" translation table 
entries', I would end up jumping into lala land shortly after setting the 
"real" TTB.

I finally figured out it was due to the way the setttb functions were written 
in cpufunc_asm_*.S.  Here is the ARM10 code (but other ones are written in the 
same way):

	stmfd	sp!, {r0, lr}
	bl	_C_LABEL(armv5_idcache_wbinv_all)
	ldmfd	sp!, {r0, lr}

	mcr	p15, 0, r0, c2, c0, 0	/* load new TTB */

	mcr	p15, 0, r0, c8, c7, 0	/* invalidate I+D TLBs */

The call to armv5_idcache_wbinv_all is trying to get the caches into a 
quiescent state for flipping the TTB, but the ldmfd to restore the registers 
is basically undoing some of that work.  I was able to get it to work if I added:

	mcr	p15, 0, r0, c7, c7, 0	/* Invalidate I and D caches */

after the ldmfd.  It also worked if I reworked the idcache_wbinv_all function 
to use fewer registers and just stashed r0 and lr in r1 and r2.  Which leads 
us to...

2) The ARM926EJ-S provides some nifty test and clean operations.  From the TRM:

  ... use the following loop to clean the entire DCache:
  tc_loop:	MRC p15, 0, r15, c7, c10, 3	; test and clean
		BNE tc_loop

  ... use the following loop to clean and invalidate the entire DCache:
  tci_loop:	MRC p15, 0, r15, c7, c14, 3	; test clean and invalidate
		BNE tci_loop

I checked the ARM1026EJ-S TRM and it supports these same operations.  The 
question is, does anyone know why NetBSD loops over all the sets/ways instead 
of using the nifty operations?

I'm hoping to push out an ARM9E patch soon, but I'd like to get your thoughts 
on these things first.