Re: continued zfs-related lockups

To: netbsd-users%netbsd.org@localhost
Subject: Re: continued zfs-related lockups
From: Greg Troxel <gdt%lexort.com@localhost>
Date: Tue, 19 Nov 2024 20:40:44 -0500

I have published a repository related to debugging/understanding ZFS on
NetBSD:

  https://codeberg.org/gdt/netbsd-zfs

This has

  a patch that
    - adds comments
    - rototills ARC sizing (smaller)
    - disables prefetching
    - printf of arc eviction behavior (when interesting)

  a script to create many files and rm them

  a program to allocate lots of RAM, to provoke memory pressure


My current thinking (a little fuzzy, and influenced by many helpful
comments) is:

  The ARC is a cache from a 128-bit "Disk Virtual Address" to contents.
  I think this DVA space addresses all pools, or all disks backing
  pools.

  The ARC divides the world into "data" being bits that are file
  contents, and "metadata", which is the kithen sink of other data.

  ARC management is basically ok, except for the concept of lots of
  things in the ARC that cannot be feed by the evict thread.

  One has to be really careful not to printf at speed.  Perhaps because
  of the framebuffer, the kernel becomes CPU bound and never really
  recovers.   With the printfs as enabled in the patch in the repo,
  things are ok.

  /etc/daily is rough on the system.  With not enough RAM and maxvnodes
  too high (default maxvnodes on a 6G VM), it will lock up the system.

  vnodes/znodes have "dnode_t" (from a pool) and this memory is
  accounted for as being "in" the ARC.   But it isn't really in, in that
  the ARC drain routines can't free it.

  Because of this, the ARC is mostly emptied under pressure, for things
  that can be freed.  But things that cannot be freed are most of it.

  I suspect that keeping the dnode_t in vnodes may not be needed.
  Perhaps the on-disk directory information is in the ARC because it is
  addressed by DVA.   So it's not clear if what's being avoided is the
  parsing work, vs reading from disk.

  Having kern.maxvnodes high leads to trouble.  Lowering it, and
  disabling prefetch, makes things much better.

  FreeBSD has a dnlc drain routine.  Probably we need a way to reduce
  the number of zfs vnodes under pressure.  But maybe it isn't really
  just about zfs; perhaps vnodes in general should be freed.

  Despite all of the above, it seems that over time memory is leaked,
  and the system becomes memory stressed.  Eventually, as in a week or
  so, even a system with 32G of RAM locks up, if one does cvs updates of
  src, pkgsrc, pkg_rr, building releases, etc.  I suspect the dnode
  process.


If you are running zfs on even a 32G system, with most of your data in
zfs, and you cvs update pkgsrc/src, rebuild packages, and rebuild NetBSD
via build.sh, and **you can keep the system running for 30 days** please
let me know.   Please include memory size and kern.maxvnodes.

Right now on a 32GB system, I have kern.maxvnodes 300000, and dnode_t is
higher than that, at 319625 objects, via vmstat -m:

Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
dnode_t      632  1286110    0   966485 87779 32622 55157 87779     0   inf 1886


It might be that with that maxvnodes value, lower ARC, and no prefetch,
the system will now stay up.  I'm 4 days in.

(In contrast, systems not running zfs stay up until I reboot them.)

References:
- continued zfs-related lockups
  - From: Greg Troxel

Prev by Date: Root CAs [was Re: Upgrade 8.2 -> 9.4 breaks cpan?]
Next by Date: Re: [10.0_STABLE] Hard lock
Previous by Thread: continued zfs-related lockups
Next by Thread: Trying to start NPF
Indexes:

Home | Main Index | Thread Index | Old Index