Subject: Re: FreeBSD Bus DMA
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Justin T. Gibbs <gibbs@plutotech.com>
List: tech-kern
Date: 06/12/1998 08:10:48
>
>Justin,
>
>Could you *please* stop overloading the term "s/g list"?
>It merely causes confusion....

Okay.  I'll use dm_segs.

>>I was talking about static storage of the MI S/G list. My complaint is that
>>a large portion of what is in the dma map (the MI S/G list) doesn't need to
>>be in the map at all and that by including it there, the implementation is
>>forced to consume more memory space than it might otherwise have to.
>
>It's simply not true that what you call the "s/g list" doesn't need to
>be in the dmamap at all.  I've said so before; what part of "no" isn't
>getthing through?

The part that doesn't conditionalize this by saying that dm_seg data is
required only by certain (perhaps many) bus dma implementations, but not
all of them.  I am trying to distinguish the difference between data
storage mandated by an API and data storage required by an implementation.
>From your comments and those of others, it seems that NetBSD has a large
body of shared bus dma implementation code that makes use of the public
dm_segs data.  If that is the case, why can't this data simply be made
"private" to the implementation.  This does not seem to preclude MI code
should you decide to implement it that way, but it does give other 
implementations of the API the flexibility to do it in a perhaps more 
efficient mannor.

>There *are* machines where the DMA simply cannot be done unless that
>info is kept around somewhere where an interrupt handler can get at it
>with very low latency.  On those machines, not having the dmamap
>available at all times whilst the DMA is in progress is a simple
>correctness problem.
>
>There *are* machines DMA is non-cache-coherent; on these machines, DMA
>simply doesnt work properly unless the bus layer has access to the
>`MI' "s/g list" (tho IIRC, the granularity may not be entirely MI).
>On those machines, not having the dmamap available at dma_sync time is
>a simple correctness problem.
>
>I hope that much is not in dispute.

I don't contest either of these points.  Stick the dm_segs data into your
dma map for bus dma implementation for those machines then.  Just make the
data "private" so this burden isn't forced on *all machines*.

>>From there, I think the discussion focuses on the relative merits of
>two approaches:
>
>     a) three bits of ugliness: 
>         i) have the bus layer keep hold of a secret hook on the "closure"
>             (the argument passed to the driver-specific, driver-supplied
>	     "callback"  function)
>	 ii) provide a function as part of the bus-dma layer,
>	      with the same signature as the driver-specific callback,
>	      which malloc()s up a copy of the "s/g list";
>	 iii) Call (ii), with arguments from (i), at some suitable time.

This is not how the FreeBSD implementation of the API works although you
certainly are free to implement it that way.  For instance:

	i) callback data only needs to be stored for deferred operations.
	   In the common case, "keeping hold of the secret hook" is nothing
	   more than referencing two arguments in your function.

       ii) I can't parse what you mean here.  The dm_seg data is never
	   "malloced" in the FreeBSD implementation.  It is generated for
	   the client and then discarded as the implementation does not
	   require it in order to perform its job.  Another implementation
	   may simply pass in a pointer to dm_seg data it stores in the
	   opaque dma map object.

      iii) Actually, you call the stored callback function using the 
	   supplied argument at a later time.  I don't see why you believe
	   that the API enforces the way this is implemented, all the way
	   down to the parameters of an internal function.  The FreeBSD 
	   implementation doesn't work this way.

>     b) just append the info to the end of the dmamap
>        (which is then variable-length).
>
>The stated benefit to (b) is saving a trivial amount of space, for the
>case where neither the bus-dma layer nor the driver need the "dmamap"/
>"s/g list".  And it is trivial: assuming 4k transfers aligned to a 4k
>boundary, your 1Meg transfer needs 256 8-byte entries: 2kbytes, or an
>overhead of 0.2%.

Percentage of transfer size fails to show the real cost.  If the system
is designed to handle a worst case transfer size of 1MB, then all 
transactional resources had better be prepared to handle transfers of this
size.  This means 2K of data per transaction that the ahc driver must
allocate even if only a small fraction of transfers hit the 1MB mark.
The amount of storage allocated by the bus dma implementation depends
on the size and format of the MI dm_segs data.  Depending on the 
architecture, this could be larger than 2k, but we'll assume that it is.
Now to achieve good tagged queuing performance, the ahc driver will 
allocate space (on demand of course) for up to 255 transactions.  This 
translates to almost 1 MB of dm_seg like storage, per adapter.  That
number can be cut in half on some implementations if the API allows it.

>That seems very very slim grounds for changing an API that could
>otherwise be shared and allow driver sharing. Yes or no?

You seem to be merging two orthogonal reasons for having a callback.

1) Handling deferred requests. (ISA bounce buffer example).

2) Removing the need for two copies of the dm_segs information on some
   implementations.

Reason #1 is more than enough reason to change the API even if NetBSD
doesn't follow suit in my book.  #2 is unrelated to #1, but comes for
free if you take full advantage of the callback mechanism.

>(Historically, isn't it also the case that you brought up this exact
>question at lesst once durnig the bus-dma design phase, that these
>same points were made, you objected strongly to the idea of having
>"two copies" of "the same information" -- the bus_dma "s/g list" and
>teh devices own s/g list, and that everyone else considered the space
>costs to be trivial and the costs of a closure where the dmammap "s/g
>list" *is* needed to be at least as odious? We really are aruging
>over the very same point yet again?)

I don't believe I ever brought up the notion of using a callback to address
the problem, but, yes, I did complain about duplicate storage at that time.

>Further, the __real__ stated objective here was to improve ISA
>bounce-buffering (again, only on x86-like devices.  The example given
>was an ISA system with 3 SCSI controllers, wich oculd consume up to 1M
>of below-16M memory.

Actually, the amount of space is 64k per transaction which in this example
leads to the reservation of 3MB of below 16MB space.  Other adapters that
handle more concurrent transactions would require even more.

>Your earlier claims notwithtsanding, the
>callback change to the API actually has no impact whatsoever on
>improving managment and allocation of ISA bounce-buffer space.

As Jason has even admitted, some mechanism to handle deferred allocations
would be a reasonable addition to the bus DMA API.  I have shown how at
least one implementation uses this to considerably reduce its memory
footprint.  The only alternatives to this problem that have been proposed
so far are again mechanisms to defer the allocation.

>(NB: there are machines where all ISA dma is done thorugh a tiny
>hardware DMA bouncebuffer, so even the given example is specious.)

Since some machines can do this in an intelligent fashion, the "dumb"
machines must be penalized?  

>>The S/G list for a load operation is not stored in the dma map object.
>
>
>>It is provided to the client of bus_dmamap_load() via a callback function.
>>The lifetime of a mapping is the same as in NetBSD, but the client must
>>assume that the S/G list passed into it's callback function is only valid
>>during the lifetime of the callback (i.e. you can't squirrel away a pointer
>>to the S/G list, you must either use it or copy it to private storage).
>
>But on machines where you *do* need the dmamap, this is a net
>performance loss; see the points (i) .. (iii) above.

I see a wash.  Matt Thomas sees a wash.  Where is the performance penalty.

--
Justin