current-users: Re: Problems with ccd (960413)

Subject: Re: Problems with ccd (960413)
To: Justin T. Gibbs <gibbs@freefall.freebsd.org>
From: Charles M. Hannum <mycroft@mit.edu>
List: current-users
Date: 05/15/1996 11:42:12
"Justin T. Gibbs" <gibbs@freefall.freebsd.org> writes:

> 
> >That's distinctly untrue.  If there are no SCBs available when a
> >command is queued, and one can't be allocated without sleeping, then
> >the driver, under FreeBSD as well, will return TRY_AGAIN_LATER.
> 
> Only if SCSI_NOSLEEP is set.

That's true, but...

> >> At least for a FreeBSD system running ccd to 6 disks
> >> of a 2940 (16 SCBs), the failure that was reported for NetBSD never occurs
> >> and I know that there are more than 16 active transaction in this system.
> >
> >You could be getting lucky by allocating the SCBs earlier for some
> >other reason, but it's still broken.
> 
> It doesn't matter when you allocate the SCBs (memory allocation).  You will
> still have (using 8 tags) 48 SCSI transactions trying to use those 16 SCBs
> and they must be sleeping *somewhere* for this to work.  They certainly
> aren't sleeping in the upper level SCSI code because the number of total
> openings is greater than the number of SCBs, so they must be sleeping in
> my driver.

Or they're sleeping in the mid-level code because your driver
increased the `openings' before it actually had the SCBs.  See below.

> >> This is not a problem because only requests generated outside of an
> >> interrupt context will cause you to rise above your (previous) threashold.
> >
> >That's not true, either.  If you increase `openings' in an interrupt
> >context (as is done in ahc_done()) and then wake up a higher-level
> >driver (through scsi_done()), the higher-level driver may immediately
> >attempt to queue more commands than there are currently SCBs available
> >for (from the interrupt context), and this lossage mode will ensue.
> 
> wakeup schedules the sleepers to run, but they don't run until we're out of
> the interrupt context.

This has nothing to do with wakeup().  (You'll note that I wrote `wake
up'; perhaps I should have been clearer and written `call', but you
should know what this code does anyway.)  In this case there *aren't*
any sleepers.  The strategy routine has long since returned because
sdstart() ran out of openings the first time around.

> those resources, so you don't need SCSI_NOSLEEP.  free_xs will start
> exactly one transaction for each transaction freed if it doesn't do the
> wakeup which means again that you will only consume the resource you just
> freed and never go above the "previous openings" level that was set even
> if you bumped the opening count during that interrupt context.

If it `doesn't do the wakeup', then it calls sdstart(), which does
*not* limit itself to starting only one I/O operation.  It limits
itself to starting however many there are free openings for.  If the
number of openings just magically increased, then it will try to start
that many more.  Since you haven't made any attempt yet to allocate
the SCBs to match, you lose.  (Why do I feel like I'm repeating
myself?)

Now, at this point you'll probably want to argue that `openings' is
only changed on completion of an inquiry command, and that the inquiry
command always completes with no other I/O pending.  However, this is
provably false.  (Proof left as an exercise for the reader.)

Even if it weren't false, I believe the point I made several months
ago, that the code is not *designed* to avoid these pitfalls, still
stands.  It relies on an incestuous, non-obvious dependency (that
shouldn't exist) between two pieces of code.

> Now you may say that having free_xs only start one transaction is a bug,
> [...]

No; I'm saying that's not what it does.  Please read the code.


P.S. For the record, I *know* how wakeup() works.