NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)



Frank Kardel <kardel%netbsd.org@localhost> writes:

[snip]

>  TLDR:
>  - pagedaemon aggressively starts pool darining once KVA free falls below 10%
>  - ZFS won't free pool pages until free memory falls below uvmexp.freetarg.
>  - there is a huge gap between uvmexp.freetarg and 10% KVA free 
>  increasing with larger memory(10%)
>  - while below 10% KVA free ZFS eventually depletes all other pools that 
>  are cooperatively giving up pages
>     causing all sorts of shortages in other areas (visible in e.g. 
>  network buffers)

This is a pretty good description of a problem I am/was seeing with the
daily cron checking for core files.  On a DOMU with not a lot of memory,
12GB - 16GB and a WHOLE lot of ZFS filesets, this job would never
complete and the guest would appear to lock up (actually it may be any
job that did "find" that crossed into a ZFS fileset).  To work around it
I ended up commenting out the daily job.  The guest is my build system
for the OS and it would also start to bog down and would eventually hang
up after a few OS builds, but that was a more manageable situation.

With the simple kardel patch that was provided, the daily job could run
to completion and the system appears to be responsive after a couple of
days.  I have not had time to run builds to see how that effects the
matter.  The guest has 2 vcpus and I sometimes would abuse it pretty
hard by running 3 builds with -j2 on the build.sh line at the same time.
Very often the system would hang up at some point if I did this and I
had to back off and only run 1 or 2 at the same time.

>  Mitigation: allow ZFS to detect free KVA memory falling below 10% to 
>  start reclaiming memory.
>  
>  It is not related to XEN at all. Just ZFS + large memory is sufficient 
>  for the problems to occur.
>  Base issue is the big difference between 10% free KVA memory limit and 
>  uvmexp.freetarg.

I am not sure that "large memory" needs to be all that large to prompt
the problem.  The description of what happens when ZFS gobbles
everything up is pretty close to what I am seeing...

>  I seem to explain the mechanism over and over again. And so far no one 
>  has verified this analysis.
>  
>  -Frank
>  




-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org


Home | Main Index | Thread Index | Old Index