NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Linux compat and swap



On Fri, Apr 24, 2020 at 10:11:03PM +1000, Paul Ripke wrote:
> On Fri, Apr 24, 2020 at 10:18:10AM +0100, Sad Clouds wrote:
> > On Fri, 24 Apr 2020 00:50:38 -0700
> > "Greg A. Woods" <woods%planix.com@localhost> wrote:
> > 
> > > On the other hand the vm.*max percentages are just limits to how many
> > > pages will be reclaimed from other uses when a given category faces
> > > pressure from extensive and immediate use -- they do not set the
> > > maximum use for a given category.  On my rsync backup host I normally
> > > see 85% of pages allocated to file cache even though vm.filemax is
> > > just 50.
> > 
> > But what is the value of vm.filemax? It tells the system what
> > percentage of memory to steal from other uses, but this is counter
> > productive, since by stealing from vm.anon and vm.exec it can result in
> > swapping and extra disk I/O, which is what vm.filemax is trying to
> > avoid in the first place, that is unnecessary disk I/O.
> > 
> > I think the behaviour needs fixing, i.e. if vm.filemax is about to push
> > pages to swap, then it should stop.
> 
> Greg's description is far more accurate than mine, I was taking a very
> simplistic view. Really, the min/max just provide a weighting or
> priority for eviction, for each page type.
> 
> It really depends on use case. I remember relatively small shared Uni
> UNIX systems with hundreds of users logged in, most idle, and their
> processes all paged out to make room for active processes. Then there
> are databases that want to fill memory with their own block cache,
> ideally they use O_DIRECT so the file cache doesn't even come into
> the picture. Then you have soft realtime or serving systems where
> you don't want any paging, ever. And you mlock text pages at startup.
> And you don't even run with swap configured.
> (Yes, I've seen a watchdog thread that normally ticks every 100ms
> start ticking once a minute because it had to continually page in
> the handful of text pages it needed from a busy disk).
> 
> If you only want the filecache to consume "available" memory, then
> set filemin & filemax to some small value.

What is still unclear in my mind is how the system does arbitration.

Say I'm putting several vm.*max values to some high percentage.

If at instant t, file caches are needed, then, if memory is all used,
processes will start paging. If at t+delta this is anonymous data that
is scarced, it may reclaim memory from file caches that it has just now
privileged. Etc.

Does this not mean that putting all the vm.*max values too high, the
system might start thrashing? Are the diverse vm.*max values compared so
that indeed the bigger gets a priority if they are competing for a
diminushing resource? (and then equalling the max values will be
a dead end?).

And is the arbitration between needs appreciated relative to the clock
(the typical 100Hz processing frequency) or is there a time interval,
or kernel stats used to arbitrate taking into account not only the
moment?

Does somebody know what are the main source files implementing it so
that if no in depth documentation is available, the C files would give
the picture?

I guess that to optimally use a node, one has to gather stats and make
tests with a typical workload to balance the values so that the best
result, in the average case, can be achieved. It's a world by itself...
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index