tech-kern: Re: failing to keep a process from swapping

Subject: Re: failing to keep a process from swapping
To: Arto Selonen <arto+dated+1098791780.c6296bedeb6e2b19@selonen.org>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-kern
Date: 10/29/2004 21:10:48
>>>>> On Fri, 22 Oct 2004 13:18:52 +0300 (EEST),
	Arto Selonen <arto@selonen.org> said:

> On October 7th, I sent the following to current-users:
> 	http://mail-index.netbsd.org/current-users/2004/10/07/0014.html
		:
> 	vm.anonmin = 65
> 	vm.execmin = 2
> 	vm.filemin = 10
> 	vm.anonmax = 80
> 	vm.execmax = 5
> 	vm.filemax = 15

> However, no matter what I do, as soon as RSS of squid grows to
> ~330-350 MB, it starts to throw pages to swap (ie. swap usage starts
> to grow and RSS of squid shrinks). At the same time, file cache is
> kept at ~350-400MB range. For a 1 GB system that really does not use
> memory for anything else besides squid, this is not bad, but I would
> like to at least feel like *I* am the one controlling the balance
> between memory and disk caching (from squid's point of view).

>>>>> On Fri, 22 Oct 2004 18:37:08 +0300 (EEST),
	Arto Selonen <arto@selonen.org> said:

> On Fri, 22 Oct 2004, Julio M. Merino Vidal wrote:

>> Since I set:
>> 
>> vm.filemin=2
>> vm.filemax=4
>> 
>> in my /etc/sysctl.conf file, my two boxes hardly swap.
>> All other values set to defaults.

> Although I appreciate the data point, I doubt if merely reducing
> vm.filemax further would help. After all, it is already about 20
> percentage points above the set maximum.

Setting vm.filemax=4 does help, because the setting lets kernel page
scanner (i.e. VM page balancer) work earlier stage against file cache.

> My understanding from the 'Bad response' thread was that the maximums
> would be exceeded only when there was memory to spare.

That's misunderstanding.
What the vm.{anon,exec,file}{min,max} parameters do is to prevent
the page scanner work until the ratios hit the parameters.

For example, a memory consumer can exceed its maximum not only when
there is memory to spare, but also when
  - there is other memory consumer which already exceeds its maximum
and
  - the page scanner think that the consumer is using memory recently.

> Thus, when 'squid' wants more memory, it should not need to go to
> swap, as there is 200+ MB of *excess* usage for file cache.

Perhaps the file cache actually decreased to 15% of managed memory
only for a moment. If that actually happened, then anonymous memory
and file cache became equal condition, thus anonymous memory might
be paged out.

>>>>> On Tue, 26 Oct 2004 14:55:14 +0300 (EEST),
	Arto Selonen <arto@selonen.org> said:

> (I had vm.bufcache as 15%, and reducing it via sysctl
> down to the hard coded lower limit of 5% only caused that "missing" memory
> usage to drop by ~50MB). 

You can reduce the bufcache more by using vm.bufmem_hiwater and
vm.bufmem_lower.
Although 5% (or 10% or 15% even) is reasonable value for squid.

BTW, I think the 5% hard limit in the kernel is too high.

> I haven't (yet) figured out the vm.bufcache usage, so I am open to
> any help regarding that. What is it used for?

(Old) bufcache is used for file system metadata.
i.e. directories, i-nodes, etc.

> but how does that differ from "file cache"?

File cache is used for actual file contents.

> "And all I wanted was to set the limits so that the system would use RAM
> in the most efficient manner, based on known usage profile. Why do I
> need to look at the code, and somebody's dissertation thesis, to be able
> to tune system memory usage?"

I'd recommend to decrease vm.filemin and vm.filemax, because squid
itself cache file contents in its application memory space, there is
no reason to cache same file in both squid and the kernel.
Old bufcache is rather more useful for squid.

Also, if you really want to stop the paging, you can increase
vm.anonmin to the size of total anonymous memory usage (i.e. squid +
kernel).
--
soda