Subject: Re: memory tester shows up swap/page tuning bug [was Re: BUFFERCACHE,
To: None <tech-kern@NetBSD.ORG>
From: Mike Hibler <mike@fast.cs.utah.edu>
List: tech-kern
Date: 09/17/1996 23:11:31
> From: "John S. Dyson" <toor@dyson.iquest.net>
> Subject: Re: memory tester shows up swap/page tuning bug [was  Re: BUFFERCACHE,
> To: jonathan@DSG.Stanford.EDU (Jonathan Stone)
> Date: Sun, 15 Sep 1996 20:29:45 -0500 (EST)
> 
> > 
> > I've looked at the 4.4-Lite2 VM changes and nothing leaps out as being
> > related to this.   Does anyone (Mike Hibler, perhaps, or any of the
> > FreeBSD people) recognize these symptoms at all?  It's a very annoying
> > bug, to say the least....
> > 
> 
> Try this.  Sleeping on lbolt is a bit bogus.  Waking up on vm_pages_needed
> is likely more correct, with a timeout of some reasonable amount of time.
> (I don't know what the hz variable is on NetBSD, so I just filled in a
> number.)  No guarantees -- this fix (if I think what the problem is)
> is not quite correct.  A correct fix is much much more complex :-).
> 
> Good luck!!!
> 
I think that taking the tsleep out entirely will achieve about the same
effect as John's fix.  By sleeping on vm_pages_needed you likely context
switch to someone else who needs memory and immediately wakes pageout up
again.  Pageout will then discover that there are still no resources of
the type it needs and do the same thing again.

The point of the original code was that there is no reason for pageout to
continue running until the resources it needs are available.  I'm not exactly
sure why I arbitrarily chose to wait for a second, that was pretty bogus.
But hey, I did this on an hp300 with an HP-IB disk, pageouts took a little
while :-)

What I might suggest as a refinement would be to have pageout set some
specific variable like vm_pageout_need_resources (ugh, what a Mach-like
name!) and then sleep on that.  Then any pager which generates a
VM_PAGER_AGAIN return code would be responsible for waking up the pageout
daemon when the particular resources are available again.  In the case of
the swap_pager, what you are probably running out of is the cleanlist
headers (i.e., the swap_pager is backed up).  So when one becomes available
in swap_pager_clean you can do a wakeup if vm_pageout_need_resources is set.

Now comes the part where I tell you that I just lied "a little bit" in what
I said.  Sleeping in any form for any length of time is not exactly the
right thing to do.  Saying that there is "no reason" for pageout to continue
running is stretching it.  Just because one pager runs out of resources
doesn't mean that all of them have.  In reality, most of the pageout activity
is going through the swap_pager so my statement is still reasonably accurate
in that case.  The bigger issue is that, by blocking pageout, you are
preventing it from scanning further pages which might be clean and could be
freed immediately to ease the memory crunch.  The problem with doing this is
that it makes the algorithm a bit unfair.  Under heavy loads, processes which
don't dirty their pages will tend to have their pages thrown out before those
processes with dirty pages.  It might take several revs of the clock hand til
all the dirty pages get cleaned and tossed.  I think the latter is why I
added the sleep in the first place.  As I recall, the original code never
slept, and just moved on to the next inactive page.

The clock hand.  That reminds me of one other thing.  The mach pageout
daemon has only one hand.  This means that under heavy load and with a large
physical memory one sweep takes a significant amount of time.  In the scenerio
where most memory pages are dirty this leads to the pattern:

Pass 1. Everything is dirty, frantically race through memory paging
	everything out but freeing no memory.  All the while processes
	clamor for free pages and starve.

Pass 2. Instantaneously free up 100MB of memory that we cleaned on the
	first pass.  Wake everyone back up and page everything back in.

When Vax memories got large (8MB or so :-) the second hand was added to
the original BSD pageout daemon.  It follows the first at a reasonable
distance and frees pages.  This ensures a more uniform level of pages in
the free list.

Well, it has been a while since I looked closely at this code so I would
defer to John if he disputes any of this.

Mike