Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang

To: gnats-bugs%netbsd.org@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
From: Brad Spencer <brad%anduin.eldar.org@localhost>
Date: Sat, 30 Jul 2022 07:19:41 -0400

Chuck Silvers <chuq%chuq.com@localhost> writes:

> The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.
>

[snip]

>  the "paging=16" indicates that some page-outs are already in progress
>  and not completing.
>  
>  if you do the "show uvmexp" twice with a few seconds in between
>  and none of these counters are changing, then various threads are
>  probably stuck on something.  please collect a stack trace with ddb
>  of the pagedaemon thread ("pgdaemon" in ddb ps) and all of the
>  "softbio" worker threads (there will be one for each CPU).  you might
>  as well also include the "km_getwait2" thread that you mentioned above.
>  please also send me the complete output from ddb "ps" so I can see
>  if there are any other threads that look interesting.
>  
>  you may need to collect stack traces from various zfs kernel threads
>  as well, but there are probably a lot of them, so hopefully we can
>  narrow down which ones are interesting so that you don't need to
>  get stack traces for all of them.

The system was hung up this morning, probably tripped by something
running out of daily.  I have collected some of the information you
asked for and attached to this reply as a cleaned up typescript output
from the guest console.  The Xen guest is a test system for -current and
is not doing much other than running /etc/daily (when I enable it) right
now.

I can probably reproduce the zfs receive case if needed.  I left the
guest in ddb should you be interested in something else right now but
will probably reboot it at some point.

Another small data point with this hang was complaints on the guest
console from xennet about no rx buffers.  Another data point, mentioned
later, but this guest does not have any further limits on the number of
allowed vnodes.

>  > In the hard hang case, the number of "free" would be much larger, so I
>  > suspect something else is running out of resources at this point (the
>  > number for free hints at that perhaps pointing to your free page
>  > comment).  I also noticed that the pool called "zio_data_buf_51" of size
>  > 1024 didn't grow much about 16,100 with this patch, as opposed to around
>  > 30,000 with the hard hang.  Limiting the number of vnodes didn't seem to
>  > effect the behavior of the softer hang.  I may have also noticed that
>  > the system was paging to swap even though all that was going on was a
>  > zfs receive over a ssh connection.
>  
>  the previous hang was probably due to running out of kernel virtual space,
>  whereas this hang is due to running out of free physical pages.
>  
>  limiting the number of vnodes does not directly limit ARC memory usage.
>  
>  the "zfs receive" is allocating nearly all of the physical memory of
>  the system to kernel usage (ie. the ZFS ARC), and the VM mechanism to
>  apply back-pressure on kernel memory allocations is limited.
>  I'm not sure yet how ZFS is supposed to avoid pushing everything else
>  out of memory.  there may well be other mechanisms that are not
>  hooked up properly either, similar to zfs_arc_free_target before
>  the current patch.

I have been of the personal opinion that there is something with ZFS
that is leaking (for some definition of the word "leak").  It is clear
that the "zfs receive" case trips the problem very quickly, especially
for me if I am receiving a compressed file set (i.e. a zfs send -R where
one of the file sets is compressed).  However, just using the file set
(i.e. reading and writing, but even just reading, like the find-the-core
files check from /etc/daily) will also cause trouble over time.

I have a OS building guest running 9.2 and it uses ZFS a lot, both for
source and build artifacts.  Without limiting the number of vnodes I can
run a "build.sh release" 1.5 times before the system hangs up.  With
limiting the vnodes, a whole lot, I get to run 3 or so "build.sh
release" runs before a hang.  In that use case, it helps quite a bit.

>  is the behavior with the current patch worse in any way than the behavior
>  from before my previous change to arc.c on may 4?

I would say that there was an improvement.  With the "zfs receive" test
for me I was able to receive more before the hang, about 2x or 3x more
data before a hang up.  I was also able to enable /etc/daily and it did
succeed in running once and hung up on the second day.  Running
/etc/daily would trip it every time before this patch which was why I
disable it.

>  -Chuck
>  

-- 
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org

Attachment: ddb_output_2022-07-30_1
Description: cleaned up typescript output from ddb

References:
- Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
  - From: Chuck Silvers

Prev by Date: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Next by Date: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Previous by Thread: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Next by Thread: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Indexes:

Home | Main Index | Thread Index | Old Index