Subject: Re: FreeBSD Bus DMA (was Re: AdvanSys board support)
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Justin T. Gibbs <gibbs@plutotech.com>
List: tech-kern
Date: 06/12/1998 10:15:55
>Anyhow, a few options for dealing with deferred requests (I at least
>agree with you that they're necessary, and will readily acknowledge
>that the bus_dma interface doesn't deal with it all that well right now):
>
>	(1) The current bus_dma API allows the caller to specify "you
>	    can go to sleep to wait for resources".  If it's safe for
>	    the client (I'll use your term :-) to block in this manner,
>	    that's an option.  But it's not an option when you're running
>	    in interrupt context, which is a fair amount of the time in
>	    SCSI and network drivers, unfortunately.

I'm glad you agree that this isn't sufficient.

>	(2) You could add a function like the following:
>
>		void bus_dmamap_wait_async __P((bus_dma_tag_t tag
>		    bus_dmamap_t map, bus_size_t size,
>		    void (*callback)(void *), void *arg));
>
>	    This could register on a queue of things waiting for resources
>	    to become available, but that can't sleep, e.g. the command
>	    entry point of a SCSI controller.  The controller could simply
	    add the SCSI request to its queue once registering this callback,
>	    and just return "it's queued".  Once the resources become
>	    available, and it's this device's turn, the callback is called
>	    to kick the driver to get it going again.
>
>	    Obviously, if another request came in for a device which
>	    was already in async wait, it would just queue it up, and
>	    not call the async wait function again (since it needs to
>	    preserve order, and it's waiting for the resources to map
>	    the first request in its queue).
>
>	    Since per-client order will already be maintained, the
>	    back-end would be able to re-order the callbacks as it
>	    sees fit when resources become available.
>
>	    I could definitely see the merit in adding this interface
>	    to NetBSD.  In fact, it could address something that's been
>	    troubling me regarding loading of DMA maps for mbuf chains
>	    when using SGMAPs on the Alpha.

If this interface is added to NetBSD, the differences between the FreeBSD
implementation and the NetBSD implementation are easy to address.  The
difference becomes the name of the function and that the FreeBSD callback
function provides dm_seg information as arguments to the callback whereas
in the NetBSD, the callback would need to extract these parameters from
the dma map.  This is something like 3 lines of #ifdefs.  I think it gives
both of us what we want as well.  You don't need to modify all of your
deployed code unless you feel the need to do so, and even in that case
you can do it gradually.  FreeBSD gets to make the optimizations it feels
are necessary.  Minimal driver porting effort is required for both to use
the same driver.

In most of the applications I have looked at, the driver can block on 
resources and still get it's job done effectively.  Where there is an
unacceptable latency penalty for deferment, the implementation can be
tuned to make deferments unlikely.  I would suggest that it is a win
in most cases if the implementation has the fall back option to defer
in worst case scenarios (assuming the client specifies that a deferment
is okay).  This was part of the rational I used to decide that the
standard interface should be callback based rather than adding
a separate interface that, in practice, would be the one primarily used.

>	(3) Preemptive kernel threads.  This is where we really want to be,
>	    and is the direction we're actively going in.  In this model,
>	    you don't really have "interrupt context" (well, you do, but
>	    drivers don't run in interrupt context ever, except for rare
>	    cases where you don't have much choice).
>
>	    In this model, an instance of a driver (which is its own thread
>	    in the system) can _always_ block because it always runs in
>	    its own context (except for the interrupt stub that wakes
>	    the thread up when a command has completed, etc.).  In this
>	    model, once a resource is relenquished by one thread, a
>	    higher-priority thread that was blocking on the resource can
>	    immediately preempt the running thread and complete its work.

I certainly see the merits of preemtive kernel threads, but I'm hoping you
can clarify how you intend them to be used.  For instance, the CAM SCSI
layer currently uses an SWI to do mid-layer routing and command completion.
I envision this being replaced by a thread per CPU in the system to allow
parallel completion processing.  Even in this situation, I don't want these
threads to block if it can be avoided.  There are certain situations having
to do with bus, target, or lun rescanning where blockage for memory 
resources can occur, but this is such a rare event that it would be foolish
to optimize for it.  Bus dma operations, however, occur all the time.  I
don't want my driver thread to block on a bus dma operation if this means
it cannot service other asynchronous tasks such as a command completion.
I've heard of people solving this by giving each I/O it's own thread 
context, but this seems like a recipe for an unscalable system.  I would
expect that the best approach would be a combination of multiple threads
and callbacks so that threads can be staticly allocated for given tasks
and additional thread allocation to avoid blockage on deferments can be
avoided.

>Jason R. Thorpe                                       thorpej@nas.nasa.gov
>NASA Ames Research Center                            Home: +1 408 866 1912
>NAS: M/S 258-5                                       Work: +1 650 604 0935
>Moffett Field, CA 94035                             Pager: +1 650 428 6939
>

--
Justin