Subject: Re: I/O priorities
To: Bill Studenmund <wrstuden@netbsd.org>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-kern
Date: 06/21/2002 10:30:42
In message <Pine.NEB.4.33.0206201044020.23822-100000@vespasia.home-net.internetc
>I think that an i/o scheduler is the long-term solution to this problem.
>It won't make 1.6 (it's WAY too late for that), but it's what we should
>do.

I think we need to go back to basics.  Exactly what is the acutal
problem, here?

At a coarse level, the symptoms sound afwully like scenarios in
standard OS textbooks: simplistic seek optimization causes starvation.
The textbook approach is to do SCAN elevator sorts: once the disk arm
starts a sweep, new requests are postponed until the next, reverse
sweep. Within a sweep, requests are serviced in-order.  (Doing
circular scans, or C-SCAN, gives lower variance under heavy loads).

Manuel's scenario seems to be BC+ resource-hog, + plus `light' load
(the xterm-running-top case: is the traffic page-ins for the Xterm
scroll buffer?).  Here the `light' load generates requests synchronously
with its own execution; therefore it gets (at most) one request per scan.

So it sounds like the problem is that the interactive response of the
xterms is crippled by limiting it to one (pagein?) I/O per elevator
(SCAN, not C-SCAN) pass. Is that correct?  (Surely the problem isn't
that we broke elevator-sort somewhere along the line? I'd UTSL, but my
source box is currently running Windows.....)

If the cause of the xterm's I/O is something like a (circular) scan
through a scroll-buffer, and the rate of xterm's scan is slow enough
to make the pages in that region look like good candidates for
pageout, and the rate at which xterm wants to dirty those pages (each
time top's 3-sec refresh fires), then a SCAN-style algorithim is
guaranteed to give `poor' performance.

Worse, simple two-queue schemes don't really fix the problem. Either
you periodically admit the entire `large, batch-io' queue to an
elevator pass (killing the low-rate xterm for that pass); or you admit
only some small part of the large, batch-io queue per pass of the arm
across the disk.  In the latter case, if we admit 1/Nth of hte
`batch-I/O' per evelator-pass, then the batch I/Os are doing N times
as many seeks per IO request. We want N to be fairly large, so that's
a signficant hit.

Or am I missing something?