tech-kern: Re: memory tester shows up swap/page tuning bug [was Re: BUFFERCACHE,

Subject: Re: memory tester shows up swap/page tuning bug [was Re: BUFFERCACHE,
To: Mike Hibler <mike@fast.cs.utah.edu>
From: John S. Dyson <toor@dyson.iquest.net>
List: tech-kern
Date: 09/18/1996 01:16:50
> 
> > From: "John S. Dyson" <toor@dyson.iquest.net>
> > Subject: Re: memory tester shows up swap/page tuning bug [was  Re: BUFFERCACHE,
> > To: jonathan@DSG.Stanford.EDU (Jonathan Stone)
> > Date: Sun, 15 Sep 1996 20:29:45 -0500 (EST)
> > 
> > > 
> > > I've looked at the 4.4-Lite2 VM changes and nothing leaps out as being
> > > related to this.   Does anyone (Mike Hibler, perhaps, or any of the
> > > FreeBSD people) recognize these symptoms at all?  It's a very annoying
> > > bug, to say the least....
> > > 
> > 
> > Try this.  Sleeping on lbolt is a bit bogus.  Waking up on vm_pages_needed
> > is likely more correct, with a timeout of some reasonable amount of time.
> > (I don't know what the hz variable is on NetBSD, so I just filled in a
> > number.)  No guarantees -- this fix (if I think what the problem is)
> > is not quite correct.  A correct fix is much much more complex :-).
> > 
> > Good luck!!!
> > 
> I think that taking the tsleep out entirely will achieve about the same
> effect as John's fix.  By sleeping on vm_pages_needed you likely context
> switch to someone else who needs memory and immediately wakes pageout up
> again.  Pageout will then discover that there are still no resources of
> the type it needs and do the same thing again.
> 
I agree with your statement.  On FreeBSD we block on the depleted
resource, waiting for a free "swap control block" to become available.
Actually, it is possible that some of the performance anomolies that
have been reported after applying the "pseudo-fix" could be cpu utilization
based due to the context switching and the pageout daemon being a CPU time sink.

>
> freed immediately to ease the memory crunch.  The problem with doing this is
> that it makes the algorithm a bit unfair.  Under heavy loads, processes which
> don't dirty their pages will tend to have their pages thrown out before those
> processes with dirty pages.  It might take several revs of the clock hand til
> all the dirty pages get cleaned and tossed.  I think the latter is why I
> added the sleep in the first place.  As I recall, the original code never
> slept, and just moved on to the next inactive page.
> 
On FreeBSD, we have spent ALOT of time on the pageout algorithm, and
I sort of agree that it is worse to go-on to the next inactive
page.  It has some screwy performance side effects.  If the swap pager
is out of resources, you likely have multiple I/O's already queued.  At least
one I/O subsystem will be busy.  There usually aren't that many dirty
vnode backed pages, so there will not be any blocking vnode writes.  If
you let the pageout daemon go through the inactive list and free pages,
you will likely be getting rid of potentially useful .text pages.  It is
a hard judgement call on this one, but I think that it is better to wait
on some kind of condition.  It might actually be desirable from a performance
point of view (even though it is EXTREMELY ugly and bogus), to wait for
.1 sec instead of on vm_pages_needed.  Actually, everything that I have
suggested is hackery, and the problem really does deserve careful
analysis in the long term.

>
> Well, it has been a while since I looked closely at this code so I would
> defer to John if he disputes any of this.
> 
I think that you and I agree.

John