Subject: Re: more on mysql benchmark
To: Chuck Silvers <chuq@chuq.com>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-kern
Date: 03/14/2005 00:45:37
Hi,
>>>>> On Sun, 13 Mar 2005 05:46:21 -0800, Chuck Silvers <chuq@chuq.com> said:
> ideally all of the pages in the system will contain valid data of one form
> or another (cached file data if there isn't enough demand for anon pages),
> so the total usage will often be 100% or close to that. with current
> defaults (and with the alternate settings that I suggested), 100% usage
> will usually be with all three usage types in the normal zone, and the
> pagedaemon will switch to the asymmetric consideration mode when one of the
> types enters the overweight zone (or if pages are freed for some other
> reason such as a process exiting or a file being truncated, which could put
> a type into the underweight zone).
One problem is that anonymous pages often exceeds its max, and
currently the system behavior at that stage is very often not so good.
> with your suggested setting of {{10,80},{0,2},{0,1}}, 100% usage will have
> to have at least one usage type in the overweight zone, and most likely all
> of them.
Yes.
One of my intention is to put file pages always in the overweight
zone, because even with file{min,max}={0,1}, file pages often occupy
too much physical memory (at least on my machine with 128MB RAM).
Other intention is to stop the problem that page-queue-reordering
makes anonymous memory paged out.
> the point of the freebsd-based patch from yamt (and the "generational"
> scheme that a few people experimented with a few years back) is that the
> single bit of access history that the mach-derived paging-queue system
> maintains isn't nearly enough information to allow decent decisions based
> on access patterns, so these schemes retain more history. I believe the
> main difference between these is that under the freebsd scheme, continued
> accesses give a page a linear boost in retention priority, whereas under
> the proposed generational scheme, continued accesses would give a page an
> exponential boost. either of these would mitigate the queue-reordering
> effects of enforcing the usage-balance tunables. so I guess implementing
> one of these would also be a good way to see which effect of changing the
> sysctl tunables is making more of a difference (and it seems like a good
> improvement in any case).
Simon is testing the yamt's patch on his pc532.
And yamt's patch + {anon,exec,file}{min,max}={{0,0},{0,0},{0,0}} shows
better or at least same than the result of hand-tuned one without the
patch. I think he will post the results soon.
> the history of page access is (supposedly) more maintained by the pmap
> "referenced" bit than by the position in the paging queue. cycling
> through the paging queues more quickly will reduce the effectiveness of
> that, but we do call pmap_clear_reference() before reactivating pages
> due to usage-balancing, so they'll be reclaimed the next time around
> unless they really are referenced again after this. I guess your point
> is that you believe this is still giving significant unfair preference
> to pages that are reactivated due to usage-balancing.
Yes, that's my point.
As far as I see, file pages tend to continue to grow at least up to
vm.filemax and often more under the condition of continous file access.
>> > to respond to some of your other points:
>> > - it seems fine to have the sum of the max values be > 100%
>> > (though that does make the description of the semantics somewhat
>> > awkward).
>> At memory shortage condition, sum > 100% makes the page daemon
>> abandon page-access-history due to the page-queue-reordering effect.
>> That's one of things that I'd like to avoid.
> like I said earlier, the usage-balancing code will reorder the queues
> regardless of what the tunables are set to. I don't see how it's possible
> to enforce any limits based on usage type without reordering the queues.
> (it may turn out that if we retain additional access history ala freebsd,
> then we don't need the usage-type stuff at all, but that seems doubtful.)
According to the Simon's preliminary result with the yamt's patch, it
seems actually we don't need the usage-type stuff by default.
That doesn't mean we always don't need the usage-type stuff, though.
For example, Thor set vm.{anon,file}{min,max}={{10,40}{30,70}} on
ftp.netbsd.org to prevent supfilesrv and rsyncd from flushing file
cache. This sort of tuning only can be done by a human who knows
exact long-term workload, so the usage-type stuff is still useful.
>> > - I don't know why file{min,max} would want to have any specific
>> > relation to exec{min,max}.
>>
>> It's because primary reason of the existence of those VM parameters is
>> to prevent the famous UBC effect, i.e. file pages kick out anonymous and
>> executable pages from physical memory.
>> So, we nearly always have to give executable (and anonymous) pages
>> priority over file pages.
> yes, but merely setting execmin (or in your scheme, execmax) to be non-zero
> guarantees a certain amount of memory for executable pages, regardless of
> what the tunables for file pages are set to. so why would it be necessary
> that the amount of memory guaranteed to be available for exec pages be
> greater than the amount of memory guaranteed to be available for file pages?
OK, my description that vm.exec{min,max} must be greater than
vm.file{min,max} might be wrong.
The real reason is that even vm.file{min,max}={0,0} often gives too
much physical memory to file pages.
>> > - I would think that the default filemin should be at least 5%,
>> > since that was the default minimum size of the buffer cache when
>> > we used it for file data.
>>
>> I don't think so, because usually vm.file{min,max}={0,1} doesn't make
>> file pages smaller than 5%.
>> The following is what I saw on the 128MB-RAM-machine with
>> vm.{anon,exec,file}{min,max}={{10,80},{0,2},{0,1}}:
>>
>> anon exec file
>> 31% 16% 50%
>> 28% 15% 50%
>> 32% 16% 52%
>> 61% 15% 32%
>> 74% 14% 20%
>> 32% 3% 74%
>> 35% 4% 70%
>> 77% 15% 15%
>>
>> It seems file pages are rather active even with the above parameters.
> with whatever workload you were running when you collected those numbers,
> those settings didn't cause a problem. my point is that there will be
> other common workloads where those setting will cause a problem.
I'm using the setting more than 6 months on machines which have enough
RAM for anonymous memory and executable memory .
And as far as I see, the setting doesn't cause any problem except
sometimes (not always) free pages become too much.
>> BTW, have you compared your proposal of the new default:
>> vm.{anon,exec,file}{min,max}={{80,90},{5,30},{5,20}}
>> with your better sysctl settings for the MySQL benchmark?:
>> vm.{anon,exec,file}{min,max}={{80,99},{5,30},{1,20}}
>>
>> Also, is it possible to measure my proposal against it?
>> vm.{anon,exec,file}{min,max}={{10,80},{0,2},{0,1}}
> I haven't had a chance to do any more runs yet, and I won't get more time
> until next weekend. but I'll try them then.
Thanks.
Pleaes test yamt's patch with (at least) vm.{anon,exec,file}{min,max}
={{0,0},{0,0},{0,0}}, too.
--
soda