tech-kern: Re: The thorpej

Subject: Re: The thorpej_scsipi branch
To: None <mjacob@feral.com>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: tech-kern
Date: 12/28/2000 09:07:46
> I'm a little dubious about handling QFULL out of band- by the time you
> activate a thread the QFULL condition is likely to be done with. This is also
> a good time to examine the queue full backoff algorithms and maybe pay closer
> attention to the RFC that Sean Doran pointed me at (rfc2001)-  although I was
> talking to Bob Snively about this a couple of weeks ago and he and I are a
> little dubious that pure congestion control is what's needed given the nature
> of SCSI devices (he's of the opinion that as soon as a QFULL condition is
> done, jam it to the max- hence my slight unease at anything in QFULL handling
> that has race conditions or any amount of complexity other than requeing
> logic).

I guess I'd have to agree -- I don't think that TCP/IP congestion
control is similar .. the QFULL condition is more analagous to the
flow-control window being closed because the application on the other
end just isn't chewing through the data quickly enough..

The TCP congestion avoidance/congestion control algorithms assume that
excess load is *dropped* by the network [*], and thus should be
avoided if at all possible, and also that there are multiple sources
sharing a bottleneck. As a result, they back off very aggressively
when congestion is detected (i.e., cutting the number of packets in
flight -- which is only partly analagous to the number of outstanding
SCSI transactions -- in half).

If you were dealing with multiple initiators sharing a target, an
additive increase/multiplicative decrease scheme might make sense from
the point of view of fairly sharing the target among multiple clients,
but that's definitely not the common case..

I think in the SCSI case you want to expose as many pending I/O
requests to the disk as possible to get maximum opportunity for the
disk to optimize the order in which it processes them.... unless
drives slow down when they're nearing saturation.

Would be an interesting exercise to measure througput & latency
vs. number of outstanding transactions for a number of different
devices and workloads and see if that says anything useful..

						- Bill

[*] ECN -- which allows the network to mark packets as "congestion
experienced" -- changes this somewhat.. but ECN is not widely
deployed, and even in the face of ECN packet marking, networks still
need to drop packets when sources don't back off quickly enough