port-mac68k: Xmac68k building with reachover tools

Subject: Xmac68k building with reachover tools
To: None <port-mac68k@NetBSD.org>
From: Michael R.Zucca <mrz5149@acm.org>
List: port-mac68k
Date: 01/11/2004 18:25:53
Looking back, I forgot to send this to the list. Darn Reply vs. Reply 
Only button! :-) I'm wondering if the mac68k gatekeepers are interested 
in my AV SCSI DMA code as it stands or if should somebody else bang on 
it a bit more and get it into better shape. In it's current state it 
does yield some benefits but there is room for some serious 
improvements. If there's interest, I'll file a PR with code/patch 
attached.

> On Jan 7, 2004, at 2:31 PM, John Klos wrote:
>
>> That's definitely a step in the right direction! What about processor
>> utilisation? Even if actual transfer speed is not improved, reducing
>> overhead would be most welcome. After all, asynch SCSI is still much
>> faster than nfs over 10 Mbps ethernet, so this may improve compile 
>> times.
>
> I don't recall the bonnie scores for CPU utilization being all that 
> stunning. Looking at the scores again, the only big improvement seemed 
> to be in block writes for some reason. However, there was little 
> change in CPU utilization on block reads. There is definitely room for 
> improvement! :-)
>
> The real problem with regard to CPU utilization is that the DMA engine 
> doesn't use descriptors. So, since most memory isn't physically 
> contiguous, you are pushing around only 4k, 8k, or 16k on average 
> before you have to take an interrupt and reload the DMA registers for 
> the next physically contiguous block. The other problem is that my 
> patch uses the NCR interrupt instead of the DMA interrupt. That means 
> that each SCSI block transferred takes a side-trip through the generic 
> NCR code every interrupt.
>
> It seems to me a better way to use the DMA engine would be as follows:
> 1. Use bus_dma to fetch a list of the physically contiguous memory 
> blocks in the transfer
> 2. Setup the SCSI transfer on the NCR.
> 3. Load the DMA register with the first transfer and enable the DMA 
> interrupt.
> 4. When a DMA interrupt arrives, check for a SCSI error. If there is 
> none, read the next item from the bus_dma map for the transfer, 
> program it into the DMA engine and resume the transfer. If this is the 
> last block, shut off the DMA interrupt.
> 6. Handle the end of the transfer in the SCSI code.
>
> Since the DMA interrupt would only need to check for error (read the 
> SCSI status register) and then load the next transfer from the bus_map 
> it would be a very fast interrupt.
>
> Also, it would seem that the two register scheme of the AV's DMA would 
> allow an additional optimization where you load _both_ registers with 
> the first two blocks of the transfer. Then, when the first transfer 
> finished, the hardware kicks off the transfer in the other register 
> while the interrupt handler loads the recently finished register set 
> with the next block. Sort of like a two register pipeline. That would 
> hide some latency but it's not clear to me yet how you program the DMA 
> hardware to do that.
>
> I think it would also be worth it for somebody to make the AV code 
> generic so that both the SCSI and ethernet drivers could use the same 
> code (and eventually serial and floppy). It could be a device that was 
> basically an interrupt handler and start/stop commands.
>
>> Has this been pulled into current?
>
> AFAIK, it hasn't. Perhaps it might be worth it to submit the patch as 
> a PR. It would be nice to hear from the mac68k gatekeepers, though, to 
> see what they think of the code.

-- 
----------------------------------------------
  Michael Zucca - mrz5149@acm.org
----------------------------------------------
  "I'm too old to use Emacs." -- Rod MacDonald
----------------------------------------------