Subject: How SCSI DMA works
To: None <port-mips@netbsd.org>
From: Toru Nishimura <nisimura@itc.aist-nara.ac.jp>
List: port-mips
Date: 06/02/2000 18:15:39
Last night I compiled a short story about SCSI DMA machinary.  It was
originally made for the guy in Southern Hemisphere of 41S 175E who is
working with R3000 Magnum, but I found it valuable too for general
public.  It's slightly imcomplete and possibly added in some future.
Enjoy.

Tohru Nishimura
Nara Insititute of Science and Technology

--
//// Notes on SCSI DMA ////

Any DMA transfer poses address constraints as well as minimum and
maxinum size of transfers.  DMA transfer is done in blocks and transfer
size must be the multiple of the block size.  DMA transaction might not
be allowed to run across a particular address boundary because some
designs are not flexible enough to cover the arbitrary range of entire
address space with given combination of block base address and block
counter.  It's necessary to take care about the fractional data less
than transferring block size.  Either of the starting address or ending
address is quite likely not aligned to block boundary.  The fractional
transfer is managed by DMA pointer and special register(s) which hold
the residue for alignment fixup task.

- Push down to SCSI device

Starting address must be truncated and aligned to the nearest block
boundary.  DMA pointer must adjust to indicate the correct starting
address of transfer.  On the other hand, it's unnecessary to take care
about the transfer tail because transfer counter in SCSI controller
chip stops a transaction when all of outgoing data are sent to SCSI
device.

- Pull up from SCSI device

Starting address must be truncated and aligned to the nearest block
boundary.  DMA pointer must adjust to indicate the correct starting
address of memory to be overwritten by SCSI data.   It's impossible to
know how much SCSI data will be transferred from SCSI device in
advance.  Fractional data less than block size is left unwritten to
memory.  DMA channel buffering store holds the residue and DMA driver
must pick them up and write to the correct memory address to complete
the entire transaction.

.... Case study 1. DECstation IOASIC ....

DECstation IOASIC DMA channel poses 8B constraint for starting and
ending addresses.  Fractional data can be managed with a pair of 32bit
registers, SDR0 and SDR1, which is considered concatenated to hold 8B
of fractional data to be transferred.  Fractions are counted in 2B
quantity and indicated in SCR register.

- Push down to SCSI device

If starting address is not aligned to 8B boundary, SDR0 and SDR1 must
hold the entire block of 8B in question.  SCR works as DMA pointer to
indicate which chunk of 2B quantity in SDR0/SDR1 pair is the first data
to be transferred.  Starting address of blocked DMA transfer is rounded
up to the nearest 8B boundary, and to be instructed by DMAPTR
register.  It doesn't matter the unaligned ending address because SCSI
controller chip counts the total size of transfer and stops the
transaction when completed.

- Pull up from SCSI device

If starting address is not aligned to 8B boundary, SDR0 and SDR1 must
hold the entire block of 8B in question.  SCR works as DMA pointer to
indicate which chunk of 2B quantity in SDR0/SDR1 pair is to hold the
head portion of transferring data.  Starting address of DMA transfer is
truncated down to the nearest 8B boundary, and to be instructed by
DMAPTR.  The first block to be written to memory consists with two
portions; SDR0/SDR1 data placed unchanged and the head data of SCSI
transfer.  Fractional transfer tail less than 8B block size is left
unwritten to memory and stored in SDR0/SDR1 pair instead.  SCR
indicates how many 2B chunks is in subject to fixup in the pair.  In
this case DMAPTR points the address of 8B block yet to have the
residue.  DMA driver must fixup the transfer tail by writing the 2B
chunks in sequence, at most 3 times, to the destination address.

.... Case study 2. DEC3000 TCDS ....

DEC3000 TCDS DMA channel poses 4B constraint.  Because Alpha processor
enforces 4B alignment on any memory references, it's mostly unnecessary
to worry about unaligned DMA starting address.  Address is likely
comfortably aligned for DMA.  The hard case which would matter is that
pullup transfers from SCSI device might start at unaligned address or
end up with leaving fractional residue less than 4B.

Two registers, DUD0 and DUD1, hold 4B quantity respectively.  They
consist with 1B size indication and 3B worth of fractional data.  DUD0
is for unaligned starting address while DUD1 is for unaligned ending
address.  DUD0 may have an indication at the least significant byte
telling which byte of remaining 3B holds fractional residues to be
written to memory.  As DMA driver knows the starting address of DMA
transfer, it's easy to synthesize the destination with fractionals in
DUD0.  DUD1 may have an indication at the most significant byte telling
which byte of remaining 3B holds the fractional residue left unwritten
to memory when DMA transaction has stopped.  In this case SDA register
indicates the address of 4B block yet to have the residue.  DMA driver
must fixup the transfer tail by synthesizing the destination 4B with
fractional residue in DUD1.

.... Case study 3. Magnum 3000 RAMBO ASIC ....

RAMBO DMA channel poses 64B constraint.  The DMA takes block base
address and block count in pair to start DMA transfer.  RAMBO DMA
manages physically contiguous section of memory.  64B worth of DMA FIFO
buffer is designed to handle unaligned DMA starting address and
unaligned ending address.

- Pull up from SCSI device

If starting address is not aligned to 64B boundary, transfer block base
address is truncated and aligned to the nearest boundary.  The correct
starting address can be instructed by DMA pointer.  Pushing 16bit
quantities down to DMA FIFO bumps and adjusts DMA pointer by 2B
increment.  Then, DMA channel starts filling DMA FIFO with transferring
data from the location which DMA pointer indicates.  The first block of
64B consists with two portions; the pushed down data placed unchanged
and the head data of SCSI transfer.  Fractional transfer tail less than
64B block is left unwritten to memory.  DMA FIFO holds the residue and
DMA pointer indicates how many 2B chunk is to be written to the
destination, whose address is available in another DMA register.

- Push down to SCSI device

If starting address is not aligned to 64B boundary, transfer block base
address is truncated and aligned to the nearest boundary.

XXX unable to figure out how the initial block is moved to DMA FIFO XXX

Reading 16bit quantity from DMA FIFO bumps and adjusts DMA pointer by
2B increment.  Then, DMA channel starts draining DMA FIFO contents down
to SCSI device from the location which DMA pointer indicates.  It
doesn't matter the unaligned ending address because SCSI controller
chip counts the total size of transfer and stops the transaction when
completed.

.... Case study 4. SPARCstation LSI64854 ASIC ....

.... Case study 5. DEC3000 TC SG DMA ....

High end models of DEC3000 have DMA channel can handle transfers of
data populated not contiguously in physical address.  Such design is
commonly called "scatter-gather DMA."   TC SGMAP is the array store to
hold physical addresses, or page frame numbers indeed, of given memory
object.  DMA driver must fill and prepare the SGMAP array adequately
for virtually addressed DMA transfer range prior to DMA operations.
Then DMA channel starts and continues transferring looking at SGMAP
array entries in sequence.  The SGMAP design was inherited by descent
generations of Digital models.

.... Case study 6. ARCS ASIC ....

This design is unique because the DMA channel can manage virtually
addressed DMA transfer.  Traditional DMA design can work with
physically addressed memory objects because it has no knowledge about
address translation scheme of virtually addressed memory objects.  In
that case DMA driver is in charge of resolving virtually addressed
transfer address into physical address prior to DMA operations.

ARCS DMA channel runs virtually addressed DMA transfer by looking at
the copy of TLB entries in an array which describes the transferring
range to resolve the corresponding physical addresses.

---