NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang



I just tried Chuck’s latest patch, and was able to transfer data for about 3 hours before the kernel thread got into the loop, up from about 15 minutes. So improvement, but not resolved.

I’ll see if I can get a DDB session running next time.

> On Jul 28, 2022, at 8:40 AM, Brad Spencer <brad%anduin.eldar.org@localhost> wrote:
> 
> The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.
> 
> From: Brad Spencer <brad%anduin.eldar.org@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc: port-evbarm-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
>        netbsd-bugs%netbsd.org@localhost, pjledge%me.com@localhost
> Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1
> VM results in kernel thread running away and filesystem hang
> Date: Thu, 28 Jul 2022 08:36:34 -0400
> 
> Chuck Silvers <chuq%chuq.com@localhost> writes:
> 
> [snip]
> 
>> with the arbitrary limit on kernel virtual space removed and
>> zfs_arc_free_target fixed, this doesn't appear to be a problem in practice.
>> I suspect this is because enough kernel memory is accessed via the direct map
>> rather than being mapped in the kernel heap that the system always runs out
>> of free pages before it runs out of free kva.
>> 
>> my current patch with both of these changes is attached.
>> 
>> -Chuck
>> 
> 
> [patch snipped]
> 
> I applied the patch to a Xen amd64 DOMU and performed the test that
> hangs.  It will still cause the system to hang, but instead of a
> complete hard hang, there is something more akin to a soft hang.
> Nothing really responses any more on the guest (can't log into the
> console, for example, but you can type your username), but at least
> CTRL-T still works.  A shell was stuck in "flt_noram5" and another in
> "km_getwait2".  In DDB on the guest console the UVM stats are thus:
> 
> db{0}> show uvmexp
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=16
>   247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
>   pages  8893 anon, 3648 file, 3010 exec
>   freemin=256, free-target=341, wired-max=82512
>   resv-pg=1, resv-kernel=5
>   bootpages=7737, poolpages=228145
>   faults=118126, traps=113048, intrs=426958, ctxswitch=527493
>    softint=143156, syscalls=2102209
>   fault counts:
>     noram=3, noanon=0, pgwait=0, pgrele=0
>     ok relocks(total)=1103(1103), anget(retrys)=25680(5), amapcopy=15229
>     neighbor anon/obj pg=20191/186916, gets(lock/unlock)=59508/1100
>     cases: anon=14483, anoncow=11195, obj=45762, prcopy=13743, przero=31327
>   daemon and swap counts:
>     woke=10, revs=10, scans=22876, obscans=8537, anscans=2215
>     busy=0, freed=10736, reactivate=179, deactivate=26203
>     pageouts=145, pending=2156, nswget=5
>     nswapdev=1, swpgavail=1048575
>     swpages=1048575, swpginuse=2301, swpgonly=2280, paging=16
> 
> In the hard hang case, the number of "free" would be much larger, so I
> suspect something else is running out of resources at this point (the
> number for free hints at that perhaps pointing to your free page
> comment).  I also noticed that the pool called "zio_data_buf_51" of size
> 1024 didn't grow much about 16,100 with this patch, as opposed to around
> 30,000 with the hard hang.  Limiting the number of vnodes didn't seem to
> effect the behavior of the softer hang.  I may have also noticed that
> the system was paging to swap even though all that was going on was a
> zfs receive over a ssh connection.
> 
> 
> 
> -- 
> Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
> 



Home | Main Index | Thread Index | Old Index