Subject: Re: Smoother writing for LFS
To: Konrad Schroder <perseant@hhhh.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 10/23/2006 21:47:33
On Mon, Oct 23, 2006 at 05:42:29PM -0700, Konrad Schroder wrote:
> On Mon, 23 Oct 2006, Thor Lancelot Simon wrote:
> 
> For LFS, this is almost done already.  We keep a per-filesystem page 
> count, though it may be somewhat inaccurate since it isn't kept by the VM 
> system itself.

How does the count get incremented?

> In the past when I've tried doing something like what you're describing, 
> performance always degraded, so I didn't pursue it further.  I wasn't, of 
> course, testing the specific case you're trying to address.  It sounded at 
> the time, too, as if keeping track of the number of dirty pages per mount 
> point at the VM level would be an overall lose (especially if LFS is the 
> only fs that ever uses the data) so it may be possible that the 
> performance lose was due to an inaccurate count of dirty pages.

Did you limit the amount written?  My thought is to trickle out writes a
few segments at a time, to avoid the furious bursts of activity that can
look, to users, like the whole system briefly grinds to a halt.  So
instead of writing everything we could, every 0.1 seconds, we'd write,
say, no more than 25% of what we knew we could write.  You correctly
observe that the "25%" is entirely ad hoc.  We'd have to play with it,
but the idea is to leave most of the bandwidth available for reads or
for writes not scheduled by this trickle algorithm.

The two cases for which the so-called smooth syncer in syncfs is just
plain broken are the "all writes go to a small number of files" case
and the "most writes are metadata writes" case.  To handle the latter,
we'd have to also look at the buffer cache buffers.

Actually, metadata writes are small.  I wonder if it'd be effective to
just clear all of those every time we got a segment of them?

Thor