NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1 VM results in kernel thread running away and filesystem hang



The following reply was made to PR port-evbarm/56944; it has been noted by GNATS.

From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: port-evbarm-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost, pjledge%me.com@localhost
Subject: Re: port-evbarm/56944: ZFS heavy usage on NetBSD running in Mac M1
 VM results in kernel thread running away and filesystem hang
Date: Thu, 28 Jul 2022 08:36:34 -0400

 Chuck Silvers <chuq%chuq.com@localhost> writes:
 
 [snip]
 
 >  with the arbitrary limit on kernel virtual space removed and
 >  zfs_arc_free_target fixed, this doesn't appear to be a problem in practice.
 >  I suspect this is because enough kernel memory is accessed via the direct map
 >  rather than being mapped in the kernel heap that the system always runs out
 >  of free pages before it runs out of free kva.
 >  
 >  my current patch with both of these changes is attached.
 >  
 >  -Chuck
 >  
 
 [patch snipped]
 
 I applied the patch to a Xen amd64 DOMU and performed the test that
 hangs.  It will still cause the system to hang, but instead of a
 complete hard hang, there is something more akin to a soft hang.
 Nothing really responses any more on the guest (can't log into the
 console, for example, but you can type your username), but at least
 CTRL-T still works.  A shell was stuck in "flt_noram5" and another in
 "km_getwait2".  In DDB on the guest console the UVM stats are thus:
 
 db{0}> show uvmexp
 Current UVM status:
   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=16
   247536 VM pages: 7084 active, 3321 inactive, 5130 wired, 5 free
   pages  8893 anon, 3648 file, 3010 exec
   freemin=256, free-target=341, wired-max=82512
   resv-pg=1, resv-kernel=5
   bootpages=7737, poolpages=228145
   faults=118126, traps=113048, intrs=426958, ctxswitch=527493
    softint=143156, syscalls=2102209
   fault counts:
     noram=3, noanon=0, pgwait=0, pgrele=0
     ok relocks(total)=1103(1103), anget(retrys)=25680(5), amapcopy=15229
     neighbor anon/obj pg=20191/186916, gets(lock/unlock)=59508/1100
     cases: anon=14483, anoncow=11195, obj=45762, prcopy=13743, przero=31327
   daemon and swap counts:
     woke=10, revs=10, scans=22876, obscans=8537, anscans=2215
     busy=0, freed=10736, reactivate=179, deactivate=26203
     pageouts=145, pending=2156, nswget=5
     nswapdev=1, swpgavail=1048575
     swpages=1048575, swpginuse=2301, swpgonly=2280, paging=16
 
 In the hard hang case, the number of "free" would be much larger, so I
 suspect something else is running out of resources at this point (the
 number for free hints at that perhaps pointing to your free page
 comment).  I also noticed that the pool called "zio_data_buf_51" of size
 1024 didn't grow much about 16,100 with this patch, as opposed to around
 30,000 with the hard hang.  Limiting the number of vnodes didn't seem to
 effect the behavior of the softer hang.  I may have also noticed that
 the system was paging to swap even though all that was going on was a
 zfs receive over a ssh connection.
 
 
 
 -- 
 Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
 


Home | Main Index | Thread Index | Old Index