tech-kern: Re: IO Congestion Control

Subject: Re: IO Congestion Control
To: Steven M. Bellovin <smb@cs.columbia.edu>
From: Matt Thomas <matt@3am-software.com>
List: tech-kern
Date: 09/11/2006 20:26:37
Steven M. Bellovin wrote:
> On Mon, 11 Sep 2006 17:05:44 -0700, Bill Studenmund <wrstuden@NetBSD.org>
> wrote:
> 
> 
>> Please suggest an algorithm and implementation. The more detailed, the 
>> better.
>>
> There's a well-known, simple algorithm that's used in many places,
> including TCP (for average round trip time) and many operating systems for
> their schedulers.
> 
> If L_i is the average latency time -- that is, the time from when a write
> request is queued to when it finishes -- and L' is the latency of the
> most recent completed request, then
> 
> 	L_(i+1) = k*L_i + (1-k)*L
> 
> You can select k depending on how much weight you want to give the
> historical average versus the most recent request, but .5 is quite
> common.  You'll find that exact equation in Section 3.7 of RFC 793, for
> example.

There's a problem with applying this to disks...  Unlike a network,
if I issuing sequential writes, if someone issue a write to a far lba,
that's going to screw my latency which I have to seek back.  That
wouldn't happen in a network where I communicating to a local host
on my LAN and then send a packet over the internet.

I think the proper test is not the raw write rate.  Instead, the
frequency of ordered writes (fsync, meta writes, etc) and amount of
the write data they force to be written is the right measure.  I can
do writes all day and they will be deferred.  Only when a forced write
of that data is when a penalty should be applied.  Feeding into that
should be the difference between the lbas on successive writes, the
larger the difference the high the penalty.

That affects the write rate, but doesn't back-pressure dirty pages in
the UBC.  For that, you need a different approach.  We should track
how many UBC pages has dirtied.  Each process should then be able to
have a small working set of dirty UBC pages that won't be flushed
unless the number of free pages gets really small.  So when a process
needs a dirty page, and the limit of dirty pages has been reached,
create a card deck (say 1024), and assign a number of cards to a
process relative to the number of dirty UBC pages it has.  Some
processes won't get any pages (inetd for instance).  Then choose a
random card and see what process it belongs to and force some of that
process'm dirty pages to be cleaned.  These pages are then used for
the stalled process.  By constructing the deck properly, processes that
are heaving polluting the UBC with dirty pages will have their own pages
reclaimed limiting the rate they can dirty more pages.

At least, those are my semi-randoms thoughts at this time.
-- 
Matt Thomas                     email: matt@3am-software.com
3am Software Foundry              www: http://3am-software.com/bio/matt/
Cupertino, CA              disclaimer: I avow all knowledge of this message.