Chuck Silvers <chuq%chuq.com@localhost> writes: > The following reply was made to PR port-evbarm/56944; it has been noted by GNATS. > [snip] > the "paging=16" indicates that some page-outs are already in progress > and not completing. > > if you do the "show uvmexp" twice with a few seconds in between > and none of these counters are changing, then various threads are > probably stuck on something. please collect a stack trace with ddb > of the pagedaemon thread ("pgdaemon" in ddb ps) and all of the > "softbio" worker threads (there will be one for each CPU). you might > as well also include the "km_getwait2" thread that you mentioned above. > please also send me the complete output from ddb "ps" so I can see > if there are any other threads that look interesting. > > you may need to collect stack traces from various zfs kernel threads > as well, but there are probably a lot of them, so hopefully we can > narrow down which ones are interesting so that you don't need to > get stack traces for all of them. The system was hung up this morning, probably tripped by something running out of daily. I have collected some of the information you asked for and attached to this reply as a cleaned up typescript output from the guest console. The Xen guest is a test system for -current and is not doing much other than running /etc/daily (when I enable it) right now. I can probably reproduce the zfs receive case if needed. I left the guest in ddb should you be interested in something else right now but will probably reboot it at some point. Another small data point with this hang was complaints on the guest console from xennet about no rx buffers. Another data point, mentioned later, but this guest does not have any further limits on the number of allowed vnodes. > > In the hard hang case, the number of "free" would be much larger, so I > > suspect something else is running out of resources at this point (the > > number for free hints at that perhaps pointing to your free page > > comment). I also noticed that the pool called "zio_data_buf_51" of size > > 1024 didn't grow much about 16,100 with this patch, as opposed to around > > 30,000 with the hard hang. Limiting the number of vnodes didn't seem to > > effect the behavior of the softer hang. I may have also noticed that > > the system was paging to swap even though all that was going on was a > > zfs receive over a ssh connection. > > the previous hang was probably due to running out of kernel virtual space, > whereas this hang is due to running out of free physical pages. > > limiting the number of vnodes does not directly limit ARC memory usage. > > the "zfs receive" is allocating nearly all of the physical memory of > the system to kernel usage (ie. the ZFS ARC), and the VM mechanism to > apply back-pressure on kernel memory allocations is limited. > I'm not sure yet how ZFS is supposed to avoid pushing everything else > out of memory. there may well be other mechanisms that are not > hooked up properly either, similar to zfs_arc_free_target before > the current patch. I have been of the personal opinion that there is something with ZFS that is leaking (for some definition of the word "leak"). It is clear that the "zfs receive" case trips the problem very quickly, especially for me if I am receiving a compressed file set (i.e. a zfs send -R where one of the file sets is compressed). However, just using the file set (i.e. reading and writing, but even just reading, like the find-the-core files check from /etc/daily) will also cause trouble over time. I have a OS building guest running 9.2 and it uses ZFS a lot, both for source and build artifacts. Without limiting the number of vnodes I can run a "build.sh release" 1.5 times before the system hangs up. With limiting the vnodes, a whole lot, I get to run 3 or so "build.sh release" runs before a hang. In that use case, it helps quite a bit. > is the behavior with the current patch worse in any way than the behavior > from before my previous change to arc.c on may 4? I would say that there was an improvement. With the "zfs receive" test for me I was able to receive more before the hang, about 2x or 3x more data before a hang up. I was also able to enable /etc/daily and it did succeed in running once and hung up on the second day. Running /etc/daily would trip it every time before this patch which was why I disable it. > -Chuck > -- Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
Attachment:
ddb_output_2022-07-30_1
Description: cleaned up typescript output from ddb