NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)



My comments to your remark below - a longer (re-)explaination follows after that.

On 05/05/24 01:07, Chuck Silvers wrote:
On Thu, May 02, 2024 at 08:50:02AM +0000, Frank Kardel wrote:
  Thus arc..c needs an additional check to determine needfree if the kva
  free space falls below 10%.
The intention is that 64-bit kernels should configure enough KVA
to be able to map all of physical memory as kmem without running out of KVA.
We are not really running out of KVA, this is not violated. We are running into less than 10% free KVA available and ZFS still allocating KVA while other pools give up KVA which gets allocated to ZFS UNTIL
we fall below uvmexp.freetarg.
I changed the general code to work that way a while back, do we do something
different on xen?
The difference is large memory - see below for the reasoning (which can be supported via measurements/traces)
-Chuck

TLDR section at the end.

The issue is not having enough KVA configured  The issue is that with
large (not XEN specific) memory systems the page daemon attempts to keep
always at least 10% KVA free. See uvm_km.c:uvm_km_va_starved_p(void) and
uvm_pdaemon.c:uvm_pageout(void *arg).

With less than 10% KVA free the local kmem_va_starved variable is true.
This leads to skipping the UVM_UNLOCK_AND_WAIT(). Further on
usually no scan is done as
- needsfree is false as there is enough free memory (uvmexp.freetarg is 4096 in this case) - needsscan is also false as uvmpdpol_needsscan_p() does not return true that that time. But even if if we would scan it wouldn't help as the target of the scan is to get around.freetarg
free pages and it would not drain any pools where ZFS is hogging memory..

Further down the pool_drainer thread is kicked as needsfree and needsscan are false as they are mainly bound
the uvmexp.freetarg.

Following the pooldrain path an attempt is made the reclaim idle pages from the pools. At this time most pools will give up idle pages, but ZFS will hold onto them. This is because ZFS determines the we are not falling below uvmexp.freetarg and thus ZFS does not kick the arc_reclaim thread to give up pages. So all the poolthread accomplishes is the (most) other pools
give up the idle pages. while ZFS holds onto its pool allocations.

While the poolthread may dig up some more free pages ZFS will keep allocating pool (KVA) memory as long as we are not below uvmexp.freetarg. While doing this more and more pools get reduced whenever possible (because some pages where currently free). Effects are the system becoming very slow to respond, network buffers not being able to be allocated,
dropped network connections and more.

This relaxes a bit when free memory falls below uvmexp.freetarg as at that time ZFS starts giving up
pool memory.. At that time we are far below the 10% KVA starvartion limit.

While being below the 10% limit but above uvmexp.freetarg the pagedaemon happily spins while
ZFS keeps allocating more and more KVA.

So it is not having enough KVA available. It is that ZFS keeps allocating KVA until we fall below uvmexp.freetarg. With larger memory systems the gap between uvmexp.freetarg and
10% KVA increases and the problem becomes critical..

Given the current mechanics the pool memory for all non-ZFS pools is initially effectively limited to uvmexp.freetarg pages which is not enough
for reliable system operation.

It is not a XEN issue.

TLDR:
- pagedaemon aggressively starts pool darining once KVA free falls below 10%
- ZFS won't free pool pages until free memory falls below uvmexp.freetarg.
- there is a huge gap between uvmexp.freetarg and 10% KVA free increasing with larger memory(10%) - while below 10% KVA free ZFS eventually depletes all other pools that are cooperatively giving up pages causing all sorts of shortages in other areas (visible in e.g. network buffers)

Mitigation: allow ZFS to detect free KVA memory falling below 10% to start reclaiming memory.

It is not related to XEN at all. Just ZFS + large memory is sufficient for the problems to occur. Base issue is the big difference between 10% free KVA memory limit and uvmexp.freetarg.

I seem to explain the mechanism over and over again. And so far no one has verified this analysis.

-Frank


Home | Main Index | Thread Index | Old Index