NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)



The following reply was made to PR kern/57558; it has been noted by GNATS.

From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
        kardel%netbsd.org@localhost
Subject: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
Date: Sun, 05 May 2024 07:37:20 -0400

 Frank Kardel <kardel%netbsd.org@localhost> writes:
 
 [snip]
 
 >  TLDR:
 >  - pagedaemon aggressively starts pool darining once KVA free falls below 10%
 >  - ZFS won't free pool pages until free memory falls below uvmexp.freetarg.
 >  - there is a huge gap between uvmexp.freetarg and 10% KVA free 
 >  increasing with larger memory(10%)
 >  - while below 10% KVA free ZFS eventually depletes all other pools that 
 >  are cooperatively giving up pages
 >     causing all sorts of shortages in other areas (visible in e.g. 
 >  network buffers)
 
 This is a pretty good description of a problem I am/was seeing with the
 daily cron checking for core files.  On a DOMU with not a lot of memory,
 12GB - 16GB and a WHOLE lot of ZFS filesets, this job would never
 complete and the guest would appear to lock up (actually it may be any
 job that did "find" that crossed into a ZFS fileset).  To work around it
 I ended up commenting out the daily job.  The guest is my build system
 for the OS and it would also start to bog down and would eventually hang
 up after a few OS builds, but that was a more manageable situation.
 
 With the simple kardel patch that was provided, the daily job could run
 to completion and the system appears to be responsive after a couple of
 days.  I have not had time to run builds to see how that effects the
 matter.  The guest has 2 vcpus and I sometimes would abuse it pretty
 hard by running 3 builds with -j2 on the build.sh line at the same time.
 Very often the system would hang up at some point if I did this and I
 had to back off and only run 1 or 2 at the same time.
 
 >  Mitigation: allow ZFS to detect free KVA memory falling below 10% to 
 >  start reclaiming memory.
 >  
 >  It is not related to XEN at all. Just ZFS + large memory is sufficient 
 >  for the problems to occur.
 >  Base issue is the big difference between 10% free KVA memory limit and 
 >  uvmexp.freetarg.
 
 I am not sure that "large memory" needs to be all that large to prompt
 the problem.  The description of what happens when ZFS gobbles
 everything up is pretty close to what I am seeing...
 
 >  I seem to explain the mechanism over and over again. And so far no one 
 >  has verified this analysis.
 >  
 >  -Frank
 >  
 
 
 
 
 -- 
 Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
 


Home | Main Index | Thread Index | Old Index