Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang

To: port-evbarm-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,pjledge%me.com@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
From: Paul Lavoie <pjledge%me.com@localhost>
Date: Thu, 28 Jul 2022 18:40:01 +0000 (UTC)

The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.

From: Paul Lavoie <pjledge%me.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: port-evbarm-maintainer%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM
 results in kernel thread running away and filesystem hang
Date: Thu, 28 Jul 2022 12:43:46 -0400

 I just tried Chuck=E2=80=99s latest patch, and was able to transfer data =
 for about 3 hours before the kernel thread got into the loop, up from =
 about 15 minutes. So improvement, but not resolved.
 
 I=E2=80=99ll see if I can get a DDB session running next time.
 
 > On Jul 28, 2022, at 8:40 AM, Brad Spencer <brad%anduin.eldar.org@localhost> =
 wrote:
 >=20
 > The following reply was made to PR port-evbarm/56944; it has been =
 noted by GNATS.
 >=20
 > From: Brad Spencer <brad%anduin.eldar.org@localhost>
 > To: gnats-bugs%netbsd.org@localhost
 > Cc: port-evbarm-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
 >        netbsd-bugs%netbsd.org@localhost, pjledge%me.com@localhost
 > Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in =
 Mac M1
 > VM results in kernel thread running away and filesystem hang
 > Date: Thu, 28 Jul 2022 08:36:34 -0400
 >=20
 > Chuck Silvers <chuq%chuq.com@localhost> writes:
 >=20
 > [snip]
 >=20
 >> with the arbitrary limit on kernel virtual space removed and
 >> zfs_arc_free_target fixed, this doesn't appear to be a problem in =
 practice.
 >> I suspect this is because enough kernel memory is accessed via the =
 direct map
 >> rather than being mapped in the kernel heap that the system always =
 runs out
 >> of free pages before it runs out of free kva.
 >>=20
 >> my current patch with both of these changes is attached.
 >>=20
 >> -Chuck
 >>=20
 >=20
 > [patch snipped]
 >=20
 > I applied the patch to a Xen amd64 DOMU and performed the test that
 > hangs.  It will still cause the system to hang, but instead of a
 > complete hard hang, there is something more akin to a soft hang.
 > Nothing really responses any more on the guest (can't log into the
 > console, for example, but you can type your username), but at least
 > CTRL-T still works.  A shell was stuck in "flt_noram5" and another in
 > "km_getwait2".  In DDB on the guest console the UVM stats are thus:
 >=20
 > db{0}> show uvmexp
 > Current UVM status:
 >   pagesize=3D4096 (0x1000), pagemask=3D0xfff, pageshift=3D12, =
 ncolors=3D16
 >   247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
 >   pages  8893 anon, 3648 file, 3010 exec
 >   freemin=3D256, free-target=3D341, wired-max=3D82512
 >   resv-pg=3D1, resv-kernel=3D5
 >   bootpages=3D7737, poolpages=3D228145
 >   faults=3D118126, traps=3D113048, intrs=3D426958, ctxswitch=3D527493
 >    softint=3D143156, syscalls=3D2102209
 >   fault counts:
 >     noram=3D3, noanon=3D0, pgwait=3D0, pgrele=3D0
 >     ok relocks(total)=3D1103(1103), anget(retrys)=3D25680(5), =
 amapcopy=3D15229
 >     neighbor anon/obj pg=3D20191/186916, gets(lock/unlock)=3D59508/1100
 >     cases: anon=3D14483, anoncow=3D11195, obj=3D45762, prcopy=3D13743, =
 przero=3D31327
 >   daemon and swap counts:
 >     woke=3D10, revs=3D10, scans=3D22876, obscans=3D8537, anscans=3D2215
 >     busy=3D0, freed=3D10736, reactivate=3D179, deactivate=3D26203
 >     pageouts=3D145, pending=3D2156, nswget=3D5
 >     nswapdev=3D1, swpgavail=3D1048575
 >     swpages=3D1048575, swpginuse=3D2301, swpgonly=3D2280, paging=3D16
 >=20
 > In the hard hang case, the number of "free" would be much larger, so I
 > suspect something else is running out of resources at this point (the
 > number for free hints at that perhaps pointing to your free page
 > comment).  I also noticed that the pool called "zio_data_buf_51" of =
 size
 > 1024 didn't grow much about 16,100 with this patch, as opposed to =
 around
 > 30,000 with the hard hang.  Limiting the number of vnodes didn't seem =
 to
 > effect the behavior of the softer hang.  I may have also noticed that
 > the system was paging to swap even though all that was going on was a
 > zfs receive over a ssh connection.
 >=20
 >=20
 >=20
 > --=20
 > Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - =
 http://anduin.eldar.org
 >=20

Prev by Date: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Next by Thread: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang
Indexes:

Home | Main Index | Thread Index | Old Index