Subject: Re: Quadra AV SCSI DMA Code
To: Aaron Brown <aaron@results-computing.net>
From: Michael R.Zucca <mrz5149@acm.org>
List: port-mac68k
Date: 06/08/2003 20:32:01
On Sunday, June 8, 2003, at 08:33 PM, Aaron Brown wrote:
> Thanks for the code. I'll start looking at it asap. Do you have any
> good sources of information about the Quadra SCSI/DMA controller?
Nothing formal, just what I've gleaned from staring at the ROM
disassemblies and pieced together from the AV tech notes and such.
The SCSI is just your typical NCR SCSI controller that is used in every
other Quadra. NetBSD has a very capable SCSI driver for that chip. The
DMA engine, however, is a pure Apple creation.
I've got most of the DMA engine stuff laid out in PSC code updates that
I made. The only thing I didn't really do was find out what interrupt
the SCSI DMA channel was on but I'm pretty sure it has a DMA interrupt.
It's probably a low priority interrupt since I believe it has the
lowest interrupt priority listed in the technote. I'll bet it's
interrupt 0 on the PSC's level 4 interrupts or it's somewhere in the
PSC's level 3 interrupts.
The DMA engine is pretty simple. Dave Huang first described it when he
did DMA ethernet. I took what he had and looked at what the ROM code
did and got a better handle on what's going on.
Basically, there are DMA channels set aside for each device that can do
DMA. Some DMA channels have different width/alignment requirements. For
instance, the SCSI channel appears to do DMA's 2 bytes at a time (as
shown in the technote) and requires a 16 byte alignment (as seen by
experience). While other channels like the Serial and Floppy channels
can do transactions 1 byte at a time and have some other alignment
requirement. I've looked at the floppy ROM code and it appears that
there is no alignment restriction on that channel. I suspect that the
minimum size is device dependent, while the alignment restriction
probably has something to do with the DMA engine and how it was
programmed for a particular channel, or it has something to do with a
cache line size (is 16 bytes a 040 cache line?). Though, the DMA engine
is _NOT_ coherent, so why cache line size would matter I can't imagine.
Since there is no snooping between the DMA engine and the CPU, I used
the bus_dma infrastructure NetBSD provides. If you do any further work,
I strongly recommend continuing to use it to avoid weird caching
issues. Besides, it's really a really well thought out and cool
interface :-) It also has all the necessary infrastructure to
find/combine contiguous physical regions given a virtual address and
length.
In any case each DMA channel has what I call two DMA "streams" (or
register sets in NetBSD/mac68k parlance). There is a channel
control/status register and three registers for each stream: transfer
address, transfer length, command/status. I think the DMA engine is
supposed to be set up so that you can have one DMA stream running and
another DMA pending though I haven't used that feature. Everything is
strictly one stream at a time right now with one transfer. In the
future it might be nice to have one transfer "in flight" while queuing
another transfer to go. This would be nice in a scenario where we have
the DMA interrupt doing chaining. If the SCSI setup routine programmed
the first two segments of a transfer, when the DMA engine interrupts
looking for another segment, the other stream could be doing a transfer
while the interrupt is being processed by the CPU! This might yield a
really good latency/throughput win.
Check out my routines to see how to control the DMA engine in general.
It's a little magical right now. Perhaps in the future I'll describe it
better.
I think the plan of attack for optimizations is:
1. Find the DMA interrupt and change the code so that the SCSI code
just passes the DMA routines the bus_dma information about the
transfer. Then, when the the DMA interrupt fires, the code can just
take the next segment from the bus_dma information and slam it into the
DMA engine. This should improve interrupt latency significantly for
multisegment transfers since reading a value from a bus_dma structure
and slamming into the dma engine is much less work than fooling around
in the SCSI state-machine to setup the next transfer.
2. Optimize the code so that we do overlapping transfers like I
described above (i.e. one stream running while another loads) for dma's
that have two or more segments.
3. Do un-16-byte-aligned transfers under 4k by copying the data to/from
a pre-allocated and aligned transfer buffer. This will help solve the
sync negotiation problem by boiling it down to figuring out how to do
odd-sized transfers that appear to be completely DMA.
4. Figure out a way to do odd-sized reads/writes that will satisfy the
sync negotiation code. This might be accomplished using a transfer pad
(a feature of the SCSI chip) or by doing a PIO read/write of the last
byte in the DMA interrupt. I tried doing the read/write of the last
byte in the SCSI interrupt, but I think that it's too late by the time
the SCSI interrupt fires. I think if you took the SCSI interrupt you've
failed the sync negotiation already. I also wonder if a PIO read/write
to the SCSI FIFO would also blow the sync negotiation. In any case,
this is the most obnoxious problem to solve so I suggest you save it
for last. :-)
If you have any questions, just ask.
--
----------------------------------------------
Michael Zucca - mrz5149@acm.org
----------------------------------------------
"I'm too old to use Emacs." -- Rod MacDonald
----------------------------------------------