Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang

To: port-evbarm-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,pjledge%me.com@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
From: Chuck Silvers <chuq%chuq.com@localhost>
Date: Sat, 30 Jul 2022 10:30:03 +0000 (UTC)

The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.

From: Chuck Silvers <chuq%chuq.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1
 VM results in kernel thread running away and filesystem hang
Date: Sat, 30 Jul 2022 03:28:16 -0700

 On Thu, Jul 28, 2022 at 08:36:34AM -0400, Brad Spencer wrote:
 > Chuck Silvers <chuq%chuq.com@localhost> writes:
 > >  my current patch with both of these changes is attached.
 > >  
 > >  -Chuck
 > >  
 > 
 > [patch snipped]
 > 
 > I applied the patch to a Xen amd64 DOMU and performed the test that
 > hangs.  It will still cause the system to hang, but instead of a
 > complete hard hang, there is something more akin to a soft hang.
 > Nothing really responses any more on the guest (can't log into the
 > console, for example, but you can type your username), but at least
 > CTRL-T still works.  A shell was stuck in "flt_noram5" and another in
 > "km_getwait2".  In DDB on the guest console the UVM stats are thus:

 "flt_noram5" is trying allocate a page to resolve a copy-on-write fault
 for a user mapping.  this is normal when memory is low.

 "km_getwait2" is trying to allocate a page for kernel memory.
 this is also normal.

 > db{0}> show uvmexp
 > Current UVM status:
 >   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=16
 >   247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free

 the "5 free" is equal to "resv-kernel=5" below, when the count of free pages
 reaches this threshold then pages can only be allocated for certain kernel uses,
 and everything else has to wait for some pages to be reclaimed.
 the question is why pages aren't being reclaimed from anything,
 either ZFS ARC buffers or VM pages being used for other things.

 >   pages  8893 anon, 3648 file, 3010 exec
 >   freemin=256, free-target=341, wired-max=82512
 >   resv-pg=1, resv-kernel=5
 >   bootpages=7737, poolpages=228145
 >   faults=118126, traps=113048, intrs=426958, ctxswitch=527493
 >    softint=143156, syscalls=2102209
 >   fault counts:
 >     noram=3, noanon=0, pgwait=0, pgrele=0
 >     ok relocks(total)=1103(1103), anget(retrys)=25680(5), amapcopy=15229
 >     neighbor anon/obj pg=20191/186916, gets(lock/unlock)=59508/1100
 >     cases: anon=14483, anoncow=11195, obj=45762, prcopy=13743, przero=31327
 >   daemon and swap counts:
 >     woke=10, revs=10, scans=22876, obscans=8537, anscans=2215
 >     busy=0, freed=10736, reactivate=179, deactivate=26203
 >     pageouts=145, pending=2156, nswget=5
 >     nswapdev=1, swpgavail=1048575
 >     swpages=1048575, swpginuse=2301, swpgonly=2280, paging=16

 the "paging=16" indicates that some page-outs are already in progress
 and not completing.

 if you do the "show uvmexp" twice with a few seconds in between
 and none of these counters are changing, then various threads are
 probably stuck on something.  please collect a stack trace with ddb
 of the pagedaemon thread ("pgdaemon" in ddb ps) and all of the
 "softbio" worker threads (there will be one for each CPU).  you might
 as well also include the "km_getwait2" thread that you mentioned above.
 please also send me the complete output from ddb "ps" so I can see
 if there are any other threads that look interesting.

 you may need to collect stack traces from various zfs kernel threads
 as well, but there are probably a lot of them, so hopefully we can
 narrow down which ones are interesting so that you don't need to
 get stack traces for all of them.

 > In the hard hang case, the number of "free" would be much larger, so I
 > suspect something else is running out of resources at this point (the
 > number for free hints at that perhaps pointing to your free page
 > comment).  I also noticed that the pool called "zio_data_buf_51" of size
 > 1024 didn't grow much about 16,100 with this patch, as opposed to around
 > 30,000 with the hard hang.  Limiting the number of vnodes didn't seem to
 > effect the behavior of the softer hang.  I may have also noticed that
 > the system was paging to swap even though all that was going on was a
 > zfs receive over a ssh connection.

 the previous hang was probably due to running out of kernel virtual space,
 whereas this hang is due to running out of free physical pages.

 limiting the number of vnodes does not directly limit ARC memory usage.

 the "zfs receive" is allocating nearly all of the physical memory of
 the system to kernel usage (ie. the ZFS ARC), and the VM mechanism to
 apply back-pressure on kernel memory allocations is limited.
 I'm not sure yet how ZFS is supposed to avoid pushing everything else
 out of memory.  there may well be other mechanisms that are not
 hooked up properly either, similar to zfs_arc_free_target before
 the current patch.

 is the behavior with the current patch worse in any way than the behavior
 from before my previous change to arc.c on may 4?

 -Chuck

Follow-Ups:
- Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
  - From: Brad Spencer

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Previous by Thread: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Next by Thread: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Indexes:

Home | Main Index | Thread Index | Old Index