Subject: Re: FreeBSD Bus DMA (was Re: AdvanSys board support)
To: Justin T. Gibbs <gibbs@plutotech.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 06/12/1998 09:50:59
On Fri, 12 Jun 1998 10:15:55 -0600 
 "Justin T. Gibbs" <gibbs@plutotech.com> wrote:

 > If this interface is added to NetBSD, the differences between the FreeBSD
 > implementation and the NetBSD implementation are easy to address.  The
 > difference becomes the name of the function and that the FreeBSD callback
 > function provides dm_seg information as arguments to the callback whereas
 > in the NetBSD, the callback would need to extract these parameters from
 > the dma map.  This is something like 3 lines of #ifdefs.  I think it gives

Err, I think maybe I miscommunicated what I meant...

The callback doesn't say "Ok, now run this specific job", but rather "Hey,
driver: Go run your queue!  You have resources now."

 > In most of the applications I have looked at, the driver can block on 
 > resources and still get it's job done effectively.  Where there is an
 > unacceptable latency penalty for deferment, the implementation can be
 > tuned to make deferments unlikely.  I would suggest that it is a win

In the case of SCSI drivers, or at least calling down from a device's
strategy routine, blocking isn't safe, because it might be interrupt
context.  This relates to my point below...

 > I certainly see the merits of preemtive kernel threads, but I'm hoping you
 > can clarify how you intend them to be used.  For instance, the CAM SCSI
 > layer currently uses an SWI to do mid-layer routing and command completion.
 > I envision this being replaced by a thread per CPU in the system to allow
 > parallel completion processing.  Even in this situation, I don't want these
 > threads to block if it can be avoided.  There are certain situations having
 > to do with bus, target, or lun rescanning where blockage for memory 
 > resources can occur, but this is such a rare event that it would be foolish
 > to optimize for it.  Bus dma operations, however, occur all the time.  I
 > don't want my driver thread to block on a bus dma operation if this means
 > it cannot service other asynchronous tasks such as a command completion.
 > I've heard of people solving this by giving each I/O it's own thread 
 > context, but this seems like a recipe for an unscalable system.  I would
 > expect that the best approach would be a combination of multiple threads
 > and callbacks so that threads can be staticly allocated for given tasks
 > and additional thread allocation to avoid blockage on deferments can be
 > avoided.

There wouldn't be a "thread for SCSI per CPU"... there would be a
"thread per instance of a SCSI driver".  Also, with kernel threads,
the need for software interrupts goes completely away; all you do
is wake up the thread you want to run.  (Software interrupts still
have the problem that they run in interrupt context; I want interrupt
context to largely go away.)

I.e. if you have 4 BusLogics in your system:

	bha0	has its own thread
	bha1	has its own thread
	bha2	has its own thread
	bha3	has its own thread

..these threads can run on any CPU.  And since whenever bha driver code
runs, it will be running in its own context, it can always block and
never defer.  Even while asleep (blocking), the upper level will be
able to queue jobs.  The algorithm might look like this:

/*
 * bha_run_queue:
 *
 *	Run our job queue.  Called when our thread is awakened by the
 *	upper level SCSI code.
 */
void
bha_run_queue(sc)
	struct bha_softc *sc;
{
	struct scsipi_queue *scsiq = &sc->sc_link.scsi_queue;
	struct scsipi_xfer *xfer;
	int error;

 again:
	/* Grab job off queue. */
	simple_lock(&scsiq->scq_slock);
	xfer = TAILQ_FIRST(&scsiq->scq_jobs);
	simple_unlock(&scsiq->scq_slock);

	/* No work to do, just return. */
	if (xfer == NULL)
		return;

	/* ... */

	/* Map the transfer. */
	if ((error = bus_dmamap_load(sc->sc_dmat, map, xfer->xs_buf,
	    xfer->xs_buflen, xs->xs_proc, BUS_DMA_WAITOK)) != 0) {
		/*
		 * Since we can block, this truly is an error, not
		 * just a resource shortage.
		 */
		xfer->xs_error = error;
		thread_wakeup(xfer->xs_waiter);
		goto again;
	}

	/* Start the job. */
	...

	/* Look for more work. */
	goto again;
}

Note that in my world, the driver would never be invoked directly
by the upper-level SCSI code.  That code merely locks the driver's
queue, puts the job on the end, unlocks it, and wakes up the driver's
thread.

In the event the driver is blocking on the map load, the thread_wakeup()
by the upper level won't actually wake it up, which has the effect
of freezing the driver's queue, thus enforcing the necessary ordering.
(When the driver instance is idle, obviously it will be sleeping on
some well-known address, so that both the upper-level and the interrupt
stub can wake it up.)

Since the driver instance always runs in its own context, it knows that
it can always block if it has to, and never has to defer any requests.

This has other benefits, too... since all driver instances are scheduled,
based on varying priority (just like regular processes), you won't encounter
livelock conditions when you're being pounded with interrupts from
e.g. your gigabit ethernet interfaces.

Of course, this is a departure from the traditional BSD kernel structure,
but hey, we wanna move forward :-)

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 650 428 6939