Subject: Re: FreeBSD Bus DMA
To: Justin T. Gibbs <gibbs@plutotech.com>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 06/11/1998 17:45:31
Justin,

Could you *please* stop overloading the term "s/g list"?
It merely causes confusion....


>I was talking about static storage of the MI S/G list. My complaint is that
>a large portion of what is in the dma map (the MI S/G list) doesn't need to
>be in the map at all and that by including it there, the implementation is
>forced to consume more memory space than it might otherwise have to.

It's simply not true that what you call the "s/g list" doesn't need to
be in the dmamap at all.  I've said so before; what part of "no" isn't
getthing through?

There *are* machines where the DMA simply cannot be done unless that
info is kept around somewhere where an interrupt handler can get at it
with very low latency.  On those machines, not having the dmamap
available at all times whilst the DMA is in progress is a simple
correctness problem.

There *are* machines DMA is non-cache-coherent; on these machines, DMA
simply doesnt work properly unless the bus layer has access to the
`MI' "s/g list" (tho IIRC, the granularity may not be entirely MI).
On those machines, not having the dmamap available at dma_sync time is
a simple correctness problem.

I hope that much is not in dispute.

>From there, I think the discussion focuses on the relative merits of
two approaches:

     a) three bits of ugliness: 
         i) have the bus layer keep hold of a secret hook on the "closure"
             (the argument passed to the driver-specific, driver-supplied
	     "callback"  function)
	 ii) provide a function as part of the bus-dma layer,
	      with the same signature as the driver-specific callback,
	      which malloc()s up a copy of the "s/g list";
	 iii) Call (ii), with arguments from (i), at some suitable time.


     b) just append the info to the end of the dmamap
        (which is then variable-length).

The stated benefit to (b) is saving a trivial amount of space, for the
case where neither the bus-dma layer nor the driver need the "dmamap"/
"s/g list".  And it is trivial: assuming 4k transfers aligned to a 4k
boundary, your 1Meg transfer needs 256 8-byte entries: 2kbytes, or an
overhead of 0.2%.

That seems very very slim grounds for changing an API that could
otherwise be shared and allow driver sharing. Yes or no?

(Historically, isn't it also the case that you brought up this exact
question at lesst once durnig the bus-dma design phase, that these
same points were made, you objected strongly to the idea of having
"two copies" of "the same information" -- the bus_dma "s/g list" and
teh devices own s/g list, and that everyone else considered the space
costs to be trivial and the costs of a closure where the dmammap "s/g
list" *is* needed to be at least as odious? We really are aruging
over the very same point yet again?)


Further, the __real__ stated objective here was to improve ISA
bounce-buffering (again, only on x86-like devices.  The example given
was an ISA system with 3 SCSI controllers, wich oculd consume up to 1M
of below-16M memory.  Your earlier claims notwithtsanding, the
callback change to the API actually has no impact whatsoever on
improving managment and allocation of ISA bounce-buffer space.

(NB: there are machines where all ISA dma is done thorugh a tiny
hardware DMA bouncebuffer, so even the given example is specious.)


>The S/G list for a load operation is not stored in the dma map object.


>It is provided to the client of bus_dmamap_load() via a callback function.
>The lifetime of a mapping is the same as in NetBSD, but the client must
>assume that the S/G list passed into it's callback function is only valid
>during the lifetime of the callback (i.e. you can't squirrel away a pointer
>to the S/G list, you must either use it or copy it to private storage).

But on machines where you *do* need the dmamap, this is a net
performance loss; see the points (i) .. (iii) above.