Subject: FreeBSD Bus DMA (was Re: AdvanSys board support)
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Justin T. Gibbs <gibbs@plutotech.com>
List: tech-kern
Date: 06/10/1998 14:10:56
>>  I had hoped to share an identical bus dma interface with
>>NetBSD, but I am unwilling to do this so long as the NetBSD interface
>>sacrifices speed and memory resources for absolutely no gain in
>>portability.
>
>A lot of people looked very hard at Jason's bus-space and bus-dma
>ideas, during more than a year of discussion. There was a _lot_ of
>thought about portability to a very wide range of CPUs and buses.
>Jason's ideas evolved to accomodate the wide range of hardware NetBSD
>developers support. With people pulling for performance at the same
>time.
>
>Could you describe which parts of the original bus_dma interface which
>are burning resources and speed for no good reason?  Maybe you're
>right and there are gains to be made here for everyone. Maybe there
>are some issues for arcane I/O buses which you haven't fully
>considered.  Or maybe NetBSD is aimed more to portability in the
>portability/performance tradeoff than FreeBSD
>(translation: you're both right).

Most of the changes in the FreeBSD bus dma interface were designed to give
the underlying implementation more flexibility in how it represents things
like dma maps and dma tags.  This means that, should the underlying 
implementation encounter a "no-op" situation, it can be implemented without
allocating per transaction resources.

The other major change is to allow dma map loads to be optionally deferred
until they can be satisfied.  This dramatically reduces the amount of
memory that is required to implement things like bounce buffers as you
don't need to allocate resources for all possibly concurrent transactions
in there worst case of all pages being bounced a client may have active up
front.  To give a concreate example, three 1542 cards in a NetBSD-i386
system with 32MB of ram would require 3MB of memory below 16MB to be
allocated.  In FreeBSD, we tap out at around 512k, an easily tuned maximum.
In performing tests on my systems here, I found that the large portion of
pages in transfers did not require any bouncing. In the cases where
bouncing was required, the limited size pool of bounce pages was more than
enough to never defer a request, even when talking to 4 Seagate Hawks (63
tags each on an AdvanSys ABP-5140 with space for 255 transactions).

Here's an enumeration of the major changes:

1) bus_dmamap_load now looks like this:

/* 
 * A function that processes a successfully loaded dma map or an error
 * from a delay loaded map.
 */
typedef void bus_dmamap_callback_t(void *, bus_dma_segment_t *, int, int);

/* 
 * Map the buffer buf into bus space using the dmamap map.
 */
int bus_dmamap_load(bus_dma_tag_t dmat, bus_dmamap_t map, void *buf, 
                    bus_size_t buflen, bus_dmamap_callback_t *callback,
                    void *callback_arg, int flags);

So the client must specify a function to receive the dma segment list
and possibly any errors.  If the operation is deferred, EINPROGRESS
is returned.  The interface allows the client to request that the operation
fail rather than be deferred.  Clients that have these needs should ensure
that all resources are allocated in advance by specifying the appropriate
flags when it's dma tag and dma maps are created.

The client must assume that the lifetime of the passed in bus_dma_segment
array is that of the callback function.  Having static space allocated to
hold the mapping is of no use to the client in typical practice.  The MI-MB
S/G format is rarely in the exact format required by the client and even if
it is, the bus dma framework makes no guarantee that the S/G list is
allocated in memory that can be dma'ed from, a requirement for many
clients.  So, if the bus dma implementation does not require this exact
data in order to perform it's task (as is the case in the FreeBSD bus dma
implementation), it is free to allocate temporary memory to store the
mappings.  This may not seem like a large savings, but consider that to map
a 1MB transfer using 32bit address and count pointers would require 256k of
allocated memory in both the client and the dma map in order to handle the
worst case.  Cutting that requirement in half seemed a wise thing to do.

2) The concept of dma tags has been extended to be more hierarchical than
in the NetBSD implementation.  In a typical FreeBSD driver, a parent tag
is provided (should be passed in the attach args, but our configuration
code is a mess), and then "derived from" in order to specify the exact dma
characteristics of the device.  The idea is to allow a hierarchy to be
built with each component, be it "main bus", host->pci bridge, pci->pci 
bridge, pci driver attach front end, or the MI driver, to further refine
or restrict dma on the device.  The implementation of things like bounce
buffers is "global" and can be used by any tag that imposes these kinds of
limitations.  The implementation is prepared to handle even a brain dead
PCI device that for some reason or another cannot perform transfers to the
full, normal, PCI address range.

3) I've introduced a "filter function" which may be specified in the dma
tag to allow a client to say, "hey I may not be able to dma to an address in
this range, but ask my filter function first".  This is used in the 
BusLogic driver to more efficiently deal with the VLB dma bugs in that
hardware.  In practice, only a few pages in every MB of memory above 16MB
needs to be bounced for these cards.

That being said, the FreeBSD bus dma interfaces are in woeful need of being
documented.  I simply haven't found the time yet.  I'm also quite positive
that there are certain flags or interfaces that I still need to pull or
refresh from NetBSD to make the code work in all cases on all platforms.
The goal behind these changes was to increase efficiency and flexibility
without hurting platform independence and I'm certainly interested in
hearing if I have violated that goal in any way.

--
Justin