Subject: Cacheing parts of podule space
To: None <port-arm32@netbsd.org>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm32
Date: 09/09/1999 10:05:33
First some background.

I finally got fed up with the sucky performance of my Acorn AKA-31 SCSI 
card, so I've re-written the driver for it to make use of the on-board 
buffer memory that can be used for DMA to the SCSI bus.  Now, instead of 
the truly sucking 100K/s or thereabouts that I used to get, I now get 
about 900K/s with much lower load on the machine.

However, I've hit a brick wall; the bottleneck now seems to be the time 
taken to transfer the data to the buffer memory.  The podule-space is 
mapped uncached (sounds reasonable, you say), but on a StrongARM this 
means that ldm/stm transfers are not buffered or streamed, so the hardware 
in effect breaks out each load/store in the instruction into a separate 
bus transaction, which probably means that the throughput to the buffer 
memory is divided by at least 2 and probably 3 (I forget the details).  
Ouch!  Further, these cycles are all running at the podule bus speed, 
again I forget the numbers, but that's something like 8MHz.

So, to the question.  Is there a way to map just one page of podule-space 
(the page where the buffer memory is mapped) to be cached/bufferable?  I 
really think that on a strongarm this will be a sufficient win to make 
syncing the cache during such transfers a price worth paying.

Richard.

I'll make the code to this available when I've tidied up a few things.