NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/57558: pgdaemon 100% busy - no scanning (ZFS case)
My comments to your remark below - a longer (re-)explaination follows
after that.
On 05/05/24 01:07, Chuck Silvers wrote:
On Thu, May 02, 2024 at 08:50:02AM +0000, Frank Kardel wrote:
Thus arc..c needs an additional check to determine needfree if the kva
free space falls below 10%.
The intention is that 64-bit kernels should configure enough KVA
to be able to map all of physical memory as kmem without running out of KVA.
We are not really running out of KVA, this is not violated. We are
running into less than 10% free KVA available
and ZFS still allocating KVA while other pools give up KVA which gets
allocated to ZFS UNTIL
we fall below uvmexp.freetarg.
I changed the general code to work that way a while back, do we do something
different on xen?
The difference is large memory - see below for the reasoning (which can
be supported via measurements/traces)
-Chuck
TLDR section at the end.
The issue is not having enough KVA configured The issue is that with
large (not XEN specific) memory systems the page daemon attempts to keep
always at least 10% KVA free. See uvm_km.c:uvm_km_va_starved_p(void) and
uvm_pdaemon.c:uvm_pageout(void *arg).
With less than 10% KVA free the local kmem_va_starved variable is true.
This leads to skipping the UVM_UNLOCK_AND_WAIT(). Further on
usually no scan is done as
- needsfree is false as there is enough free memory
(uvmexp.freetarg is 4096 in this case)
- needsscan is also false as uvmpdpol_needsscan_p() does not return
true that that time.
But even if if we would scan it wouldn't help as the target of the scan
is to get around.freetarg
free pages and it would not drain any pools where ZFS is hogging memory..
Further down the pool_drainer thread is kicked as needsfree and
needsscan are false as they are mainly bound
the uvmexp.freetarg.
Following the pooldrain path an attempt is made the reclaim idle pages
from the pools.
At this time most pools will give up idle pages, but ZFS will hold onto
them. This is because
ZFS determines the we are not falling below uvmexp.freetarg and thus ZFS
does not kick
the arc_reclaim thread to give up pages. So all the poolthread
accomplishes is the (most) other pools
give up the idle pages. while ZFS holds onto its pool allocations.
While the poolthread may dig up some more free pages ZFS will keep
allocating pool (KVA)
memory as long as we are not below uvmexp.freetarg. While doing this
more and more pools
get reduced whenever possible (because some pages where currently free).
Effects are
the system becoming very slow to respond, network buffers not being able
to be allocated,
dropped network connections and more.
This relaxes a bit when free memory falls below uvmexp.freetarg as at
that time ZFS starts giving up
pool memory.. At that time we are far below the 10% KVA starvartion limit.
While being below the 10% limit but above uvmexp.freetarg the pagedaemon
happily spins while
ZFS keeps allocating more and more KVA.
So it is not having enough KVA available. It is that ZFS keeps
allocating KVA until we fall
below uvmexp.freetarg. With larger memory systems the gap between
uvmexp.freetarg and
10% KVA increases and the problem becomes critical..
Given the current mechanics the pool memory for all non-ZFS pools is
initially effectively limited to uvmexp.freetarg pages which is not enough
for reliable system operation.
It is not a XEN issue.
TLDR:
- pagedaemon aggressively starts pool darining once KVA free falls below 10%
- ZFS won't free pool pages until free memory falls below uvmexp.freetarg.
- there is a huge gap between uvmexp.freetarg and 10% KVA free
increasing with larger memory(10%)
- while below 10% KVA free ZFS eventually depletes all other pools that
are cooperatively giving up pages
causing all sorts of shortages in other areas (visible in e.g.
network buffers)
Mitigation: allow ZFS to detect free KVA memory falling below 10% to
start reclaiming memory.
It is not related to XEN at all. Just ZFS + large memory is sufficient
for the problems to occur.
Base issue is the big difference between 10% free KVA memory limit and
uvmexp.freetarg.
I seem to explain the mechanism over and over again. And so far no one
has verified this analysis.
-Frank
Home |
Main Index |
Thread Index |
Old Index