Subject: Re: Suggested fix for NetBSD 1.2 kernels
To: None <jonathan@DSG.Stanford.EDU>
From: Wolfgang Solfrank <ws@kurt.tools.de>
List: tech-kern
Date: 09/18/1996 16:52:04
> For those who care,  the bug is that there is *nothing* in the
> code that enforces a minimum rate  at which pages on the inactive
> list are actually clenead and made free.  John Dyson's patch
> fixes this by waking up the pager whenver pages are short.

Actually, what I was trying to explain was that it's not really the waking
of the pager when pages are short that is intended here, but the waking when
resources that were short become available again. The fact that the same
address is used for signalling both conditions is more or less by accident,
albeit it helps by making the system behave more, as Mike put it, "unfair".

> The point I'm trying to get across is that
> even with the fix applied, architecturally or philosophically, the
> pager is doing the *WRONG THING* when it gets woken up.  Or rather,
> it's not *DIRECTLY* doing the *right* thing, which is to make sure
> some pages are freed up.  All it's doing is forcing more pages from
> the active list to the inactive list, and thus (hopefully) causing
> some of those pages to eventually get freed.

That's not correct. What normally happens is this (only looking at the relevant
parts here):

When the pager is first woken up due to low memory, it scans the inactive queue.

Pages that got referenced since placed on the inactive queue are reactivated.
Pages that are clean get immediately transferred to the free queue.
Dirty pages that are actively being paged out are ignored for now.
For the other pages a pageout is initiated.

When enough pages are transferred to the free queue (by the second case above)
to satisfy the free target, the scan stops.

Later on before sleeping again, the pageout daemon looks through the ongoing
pagings to look for any that's completed and marks the relevant pages clean.
Note that this latter is also done when starting paging on a page, but is
(probably architecturally incorrect) in the responsibility of the pager.

Now the pathological case:

When trying to start a pageout, the pager might fall short on some resource.
If this happens the pageout daemon goes to sleep. It is this sleep that
is blocking the progress of the pageout in this pathological case. When the
wakeup from this sleep occurs, the pageout daemon isn't "forcing more pages
from the active list to the inactive list", it's rather continuing the scan
of the inactive list, probably blocking again on the next try to initiate
a pageout of a dirty page.

> Running *without* John's patch, with a "systat -w 1 vmstat" already
> running in a local (rlogin, not Xterm) window, is an excellent way to
> demonstrate that.  During the freeze period, the pageout process is
> being awakened once a second, and doing its thing.  But that doesn't
> help at *all* until it's cleaned *all* the pages on the inactive list.
> None of those pages actually get freed until *every* page on the
> inactive list gets  written back.  My inactive list had 18Mbytes
> of pages. 

No, pages WILL get freed from the inactive queue during the scan. 

> The pager tries to start cleaning (i.e,. initiates asynchronous
> writeback) on at least cnt.v_free_min pages under memory-shortfall
> conditions.  That's 64 pages, or 256Kbytes, on my systems. Assume the
> inactive list is approximately 1/3 of physical memory.  That lets you
> estimate how long the freezes will last; it's fairly accurate on my
> systems.  [[note 1]]

This is just a conincidence with the fact that the default for the number
of outstanding I/O for the swap pager is 64. The pager tries to start
cleaning on at most NPENDINGIO pages from the inactive queue, but only until
enough pages have been freed to satisfy cnt.v_free_target.

One might want to try an increase of NPENDINGIO by using say a
"options NPENDINGIO=256" entry in ones config file. (Anyone wanting to try
this please contact me privately, as I'm not sure that this is really that
easy).

But IMHO it's time to discuss the fairness approach that Mike claimed here.

The current approach blocks the pageout daemon completely when there is a
shortage in resources needed to page out a dirty page. The idea is obviously
to prevent processes writing to their memory to eventually hog all memory
with dirty pages. If this is really what we want, we should probably block
the writing process itself rather than the pageout daemon, e.g. in vm_fault
when there is a write fault and a memory shortage condition exists.
--
ws@TooLs.DE     (Wolfgang Solfrank, TooLs GmbH) 	+49-228-985800