current-users: Re: Problems with ccd (960413)

Subject: Re: Problems with ccd (960413)
To: Justin T. Gibbs <gibbs@freefall.freebsd.org>
From: Charles M. Hannum <mycroft@mit.edu>
List: current-users
Date: 05/14/1996 22:46:38

"Justin T. Gibbs" <gibbs@freefall.freebsd.org> writes:

> 
> >> The implication is that your SCSI controller is failing to send the 
> >> command.  It could also be that the SCSI subsystem is unable to malloc() 
> >> storage for a data structure (note the SCSI_NOSLEEP).
> 
> Ahhh.  The infamous SCSI_NOSLEEP problem.  This was fixed in FreeBSD more
> than a year ago (when wcarchive was using a 2742 controller) and is
> caused by the SCSI system setting SCSI_NOSLEEP more often then required.

It appears all that was done was to make the start routines not use
SCSI_NOSLEEP when called from the strategy routine.  It is still set
when called from an interrupt (i.e. on command completion), so if you
have enough I/O pending to overflow the SCBs already allocated, you
will still lose.  This is made more noticable by ccd because it can
have the effect of queueing many requests at once to the same
controller.

As we discussed at Usenix, one way to hack around this is to increase
the `openings' (correct English spelling, BTW B-)) count from
*_scsi_cmd(), only when a command is queued with SCSI_NOSLEEP cleared,
and to ensure when it is changed that all of the data structures
needed for all openings on all devices attached to that controller are
preallocated.  The ahc driver does not do this.

As I've said before, the best solution is to eliminate the need for
SCSI_NOSLEEP.  This requires more restructuring of the SCSI code,
however.