Subject: Re: more on mysql benchmark
To: Chuck Silvers <chuq@chuq.com>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-kern
Date: 03/14/2005 00:45:37
Hi,

>>>>> On Sun, 13 Mar 2005 05:46:21 -0800, Chuck Silvers <chuq@chuq.com> said:

> ideally all of the pages in the system will contain valid data of one form
> or another (cached file data if there isn't enough demand for anon pages),
> so the total usage will often be 100% or close to that.  with current
> defaults (and with the alternate settings that I suggested), 100% usage
> will usually be with all three usage types in the normal zone, and the
> pagedaemon will switch to the asymmetric consideration mode when one of the
> types enters the overweight zone (or if pages are freed for some other
> reason such as a process exiting or a file being truncated, which could put
> a type into the underweight zone).

One problem is that anonymous pages often exceeds its max, and
currently the system behavior at that stage is very often not so good.

> with your suggested setting of {{10,80},{0,2},{0,1}}, 100% usage will have
> to have at least one usage type in the overweight zone, and most likely all
> of them.

Yes.
One of my intention is to put file pages always in the overweight
zone, because even with file{min,max}={0,1}, file pages often occupy
too much physical memory (at least on my machine with 128MB RAM).
Other intention is to stop the problem that page-queue-reordering
makes anonymous memory paged out.

> the point of the freebsd-based patch from yamt (and the "generational"
> scheme that a few people experimented with a few years back) is that the
> single bit of access history that the mach-derived paging-queue system
> maintains isn't nearly enough information to allow decent decisions based
> on access patterns, so these schemes retain more history.  I believe the
> main difference between these is that under the freebsd scheme, continued
> accesses give a page a linear boost in retention priority, whereas under
> the proposed generational scheme, continued accesses would give a page an
> exponential boost.  either of these would mitigate the queue-reordering
> effects of enforcing the usage-balance tunables.  so I guess implementing
> one of these would also be a good way to see which effect of changing the
> sysctl tunables is making more of a difference (and it seems like a good
> improvement in any case).

Simon is testing the yamt's patch on his pc532.
And yamt's patch + {anon,exec,file}{min,max}={{0,0},{0,0},{0,0}} shows
better or at least same than the result of hand-tuned one without the
patch. I think he will post the results soon.

> the history of page access is (supposedly) more maintained by the pmap
> "referenced" bit than by the position in the paging queue.  cycling
> through the paging queues more quickly will reduce the effectiveness of
> that, but we do call pmap_clear_reference() before reactivating pages
> due to usage-balancing, so they'll be reclaimed the next time around
> unless they really are referenced again after this.  I guess your point
> is that you believe this is still giving significant unfair preference
> to pages that are reactivated due to usage-balancing.

Yes, that's my point.
As far as I see, file pages tend to continue to grow at least up to
vm.filemax and often more under the condition of continous file access.

>> > to respond to some of your other points:
>> >  - it seems fine to have the sum of the max values be > 100%
>> >    (though that does make the description of the semantics somewhat
>> >    awkward).

>> At memory shortage condition, sum > 100% makes the page daemon
>> abandon page-access-history due to the page-queue-reordering effect.
>> That's one of things that I'd like to avoid.

> like I said earlier, the usage-balancing code will reorder the queues
> regardless of what the tunables are set to.  I don't see how it's possible
> to enforce any limits based on usage type without reordering the queues.
> (it may turn out that if we retain additional access history ala freebsd,
> then we don't need the usage-type stuff at all, but that seems doubtful.)

According to the Simon's preliminary result with the yamt's patch, it
seems actually we don't need the usage-type stuff by default.

That doesn't mean we always don't need the usage-type stuff, though.
For example, Thor set vm.{anon,file}{min,max}={{10,40}{30,70}} on
ftp.netbsd.org to prevent supfilesrv and rsyncd from flushing file
cache. This sort of tuning only can be done by a human who knows
exact long-term workload, so the usage-type stuff is still useful.

>> >  - I don't know why file{min,max} would want to have any specific
>> >    relation to exec{min,max}.
>> 
>> It's because primary reason of the existence of those VM parameters is
>> to prevent the famous UBC effect, i.e. file pages kick out anonymous and
>> executable pages from physical memory.
>> So, we nearly always have to give executable (and anonymous) pages
>> priority over file pages.

> yes, but merely setting execmin (or in your scheme, execmax) to be non-zero
> guarantees a certain amount of memory for executable pages, regardless of
> what the tunables for file pages are set to.  so why would it be necessary
> that the amount of memory guaranteed to be available for exec pages be
> greater than the amount of memory guaranteed to be available for file pages?

OK, my description that vm.exec{min,max} must be greater than
vm.file{min,max} might be wrong.
The real reason is that even vm.file{min,max}={0,0} often gives too
much physical memory to file pages.

>> >  - I would think that the default filemin should be at least 5%, 
>> >    since that was the default minimum size of the buffer cache when
>> >    we used it for file data.
>> 
>> I don't think so, because usually vm.file{min,max}={0,1} doesn't make
>> file pages smaller than 5%.
>> The following is what I saw on the 128MB-RAM-machine with
>> vm.{anon,exec,file}{min,max}={{10,80},{0,2},{0,1}}:
>> 
>> anon exec file
>> 31%  16%  50%
>> 28%  15%  50%
>> 32%  16%  52%
>> 61%  15%  32%
>> 74%  14%  20%
>> 32%   3%  74%
>> 35%   4%  70%
>> 77%  15%  15%
>> 
>> It seems file pages are rather active even with the above parameters.

> with whatever workload you were running when you collected those numbers,
> those settings didn't cause a problem.  my point is that there will be
> other common workloads where those setting will cause a problem.

I'm using the setting more than 6 months on machines which have enough
RAM for anonymous memory and executable memory .
And as far as I see, the setting doesn't cause any problem except
sometimes (not always) free pages become too much.

>> BTW, have you compared your proposal of the new default:
>> vm.{anon,exec,file}{min,max}={{80,90},{5,30},{5,20}}
>> with your better sysctl settings for the MySQL benchmark?:
>> vm.{anon,exec,file}{min,max}={{80,99},{5,30},{1,20}}
>> 
>> Also, is it possible to measure my proposal against it?
>> vm.{anon,exec,file}{min,max}={{10,80},{0,2},{0,1}}

> I haven't had a chance to do any more runs yet, and I won't get more time
> until next weekend.  but I'll try them then.

Thanks.
Pleaes test yamt's patch with (at least) vm.{anon,exec,file}{min,max}
={{0,0},{0,0},{0,0}}, too.
--
soda