Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: random lockups (now suspecting zfs)



On Sat, 4 Nov 2023, Simon Burge wrote:
Hi Greg,

Greg Troxel wrote:

 Fri, Oct 20, 2023 at 01:11:15PM -0400, Greg Troxel wrote:
A different machine has locked up, running recent netbsd-10.  I was
doing pkgsrc rebuilds in zfs, in a dom0 with 4G of RAM, with 8G total
physical.  It has a private patch to reduce the amount of memory used
for ARC, which has been working well.

Are you still seeing the problem below even with limiting the amount of
memory ARC can use?

All 3 tmux windows show something like

  [ 373598.5266510] load: 0.00  cmd: bash 21965 [flt_noram5] 0.37u 2.89s 0% 6396k

and I can switch among them and ^T, but trying to run top is stuck (in
flt_noram5).  I'll give it an hour or so, and have a look at the
console.

I've seen cc1plus processes wedged in either flt_noram or tstile after
doing multiple builds, and a reboot is the only way out.  I'm using ZFS
for everything except swap and some mostly-unused media files that live
on an FFS.

So to me this feels like a locking botch in a rare path in zfs.

This appears to be the case.  Chuck Silvers has some understanding of
the problem and I'm helping test, but at this stage there isn't a fix
available. :/

It's interesting that you see the lockups during pkgsrc builds, i.e. a period where there is lots of file creation. We use zfs on backup systems that pull in data with rsync. During the initial runs (where every file is new) we usually get a couple of lockups, but during day to day operation (few changes) it is reliable. These are on physical and virtual machines running NetBSD 9 with the rule of thumb of 1GB RAM per TB of storage obeyed, but no patches besides setting MAXPHYS in the module to 32k for Xen.

--
Stephen



Home | Main Index | Thread Index | Old Index