NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)



The following reply was made to PR kern/57558; it has been noted by GNATS.

From: Frank Kardel <kardel%netbsd.org@localhost>
To: Chuck Silvers <chuq%chuq.com@localhost>, gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
Date: Sun, 5 May 2024 08:33:03 +0200

 My comments to your remark below - a longer (re-)explaination follows 
 after that.
 
 On 05/05/24 01:07, Chuck Silvers wrote:
 > On Thu, May 02, 2024 at 08:50:02AM +0000, Frank Kardel wrote:
 >>   Thus arc..c needs an additional check to determine needfree if the kva
 >>   free space falls below 10%.
 > The intention is that 64-bit kernels should configure enough KVA
 > to be able to map all of physical memory as kmem without running out of KVA.
 We are not really running out of KVA, this is not violated. We are 
 running into less than 10% free KVA available
 and ZFS still allocating KVA while other pools give up KVA which gets 
 allocated to ZFS UNTIL
 we fall below uvmexp.freetarg.
 > I changed the general code to work that way a while back, do we do something
 > different on xen?
 The difference is large memory - see below for the reasoning (which can 
 be supported via measurements/traces)
 > -Chuck
 
 TLDR section at the end.
 
 The issue is not having enough KVA configured  The issue is that with
 large (not XEN specific) memory systems the page daemon attempts to keep
 always at least 10% KVA free. See uvm_km.c:uvm_km_va_starved_p(void) and
 uvm_pdaemon.c:uvm_pageout(void *arg).
 
 With less than 10% KVA free the local kmem_va_starved variable is true.
 This leads to skipping the UVM_UNLOCK_AND_WAIT(). Further on
 usually no scan is done as
      - needsfree is false as there is enough free memory 
 (uvmexp.freetarg is 4096 in this case)
      - needsscan is also false as uvmpdpol_needsscan_p() does not return 
 true that that time.
 But even if if we would scan it wouldn't help as the target of the scan 
 is to get around.freetarg
 free pages and it would not drain any pools where ZFS is hogging memory..
 
 Further down the pool_drainer thread is kicked as needsfree and 
 needsscan are false as they are mainly bound
 the uvmexp.freetarg.
 
 Following the pooldrain path an attempt is made the reclaim idle pages 
 from the pools.
 At this time most pools will give up idle pages, but ZFS will hold onto 
 them. This is because
 ZFS determines the we are not falling below uvmexp.freetarg and thus ZFS 
 does not kick
 the arc_reclaim thread to give up pages. So all the poolthread 
 accomplishes is the (most) other pools
 give up the idle pages. while ZFS holds onto its pool allocations.
 
 While the poolthread may dig up some more free pages ZFS will keep 
 allocating pool (KVA)
 memory as long as we are not below uvmexp.freetarg. While doing this 
 more and more pools
 get reduced whenever possible (because some pages where currently free). 
 Effects are
 the system becoming very slow to respond, network buffers not being able 
 to be allocated,
 dropped network connections and more.
 
 This relaxes a bit when free memory falls below uvmexp.freetarg as at 
 that time ZFS starts giving up
 pool memory.. At that time we are far below the 10% KVA starvartion limit.
 
 While being below the 10% limit but above uvmexp.freetarg the pagedaemon 
 happily spins while
 ZFS keeps allocating more and more KVA.
 
 So it is not having enough KVA available. It is that ZFS keeps 
 allocating KVA until we fall
 below uvmexp.freetarg. With larger memory systems the gap between 
 uvmexp.freetarg and
 10% KVA increases and the problem becomes critical..
 
 Given the current mechanics the pool memory for all non-ZFS pools is 
 initially effectively limited to uvmexp.freetarg pages which is not enough
 for reliable system operation.
 
 It is not a XEN issue.
 
 TLDR:
 - pagedaemon aggressively starts pool darining once KVA free falls below 10%
 - ZFS won't free pool pages until free memory falls below uvmexp.freetarg.
 - there is a huge gap between uvmexp.freetarg and 10% KVA free 
 increasing with larger memory(10%)
 - while below 10% KVA free ZFS eventually depletes all other pools that 
 are cooperatively giving up pages
    causing all sorts of shortages in other areas (visible in e.g. 
 network buffers)
 
 Mitigation: allow ZFS to detect free KVA memory falling below 10% to 
 start reclaiming memory.
 
 It is not related to XEN at all. Just ZFS + large memory is sufficient 
 for the problems to occur.
 Base issue is the big difference between 10% free KVA memory limit and 
 uvmexp.freetarg.
 
 I seem to explain the mechanism over and over again. And so far no one 
 has verified this analysis.
 
 -Frank
 


Home | Main Index | Thread Index | Old Index