Subject: Re: NULLFS locaking problem?
To: Bill Studenmund <wrstuden@zembu.com>
From: Feico Dillema <feico@pasta.cs.uit.no>
List: current-users
Date: 10/05/2000 16:21:35
On Tue, Oct 03, 2000 at 02:33:15PM -0700, Bill Studenmund wrote:
> On Thu, 7 Sep 2000, Feico Dillema wrote:
> 
> > For the past half year I have had problems on our server which seem to
> > indicate a locking problem in NULLFS. I cannot reliably reproduce it,
> > but now and then a filesystem (that is remounted with NULLFS
> > elsewhere) completely locks up. And my only solution has been to
> > reboot the machine when this happens (and now by getting rid of the
> > NULLFS mounts). Time between manifestations of this problem range
> > between a few hours to several weeks. We've ruled out hardware as the
> > problem source, and have seen the problem on various NetBSD kernel
> > versions (from -current as of end of last year to NetBSD-1.5_ALPHA and
> > NetBSD-1.5_ALPHA2).
> 
> Hmmm.... I'm way behind on my EMail. :-)
Thanks for replying anyway. 
 
> What mounts were causing problems? What was under the NULL mounts?
A normal ffs file system. NULL mount was just used to remount it
read-only elsewhere for public access.

> See if you can get the filesystem locked up, and then do a ps -l. The
> important thing is to see what the processes are waitning on. If they are
> sleeping on a vnode lock, then the WCHAN will be vnlock. Those are the
> interesting ones.
> 
> The best thing would be to build a kernel with debug (so you'll get a
> netbsd.gdb), then get a core dump when the machine hangs. From digging
> through that, we can find out what the errant processes were doing.

As we are quite dependent on this server for our daily work and it
hosts all our public services (like www2.no.netbsd.org,
anoncvs.no.netbsd.org etc), I chose to remove the null-mounts for
now. It's not a machine I can play with a lot. In two weeks or so,
we'll set up a second server and then I may have some opportunity
to reproduce the problem again and get more debugging info. For now,
it's not much of an option.

Feico.