Subject: Re: Strange hang on 4.99.49
To: Paul Goyette <paul@whooppee.com>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 01/23/2008 19:28:02
On Tue, Jan 22, 2008 at 12:48:38PM -0800, Paul Goyette wrote:
> I'm running a kernel + userland from sources as of just a few hours ago.
> 
> I started up a build.sh with -j 4 and all of a sudden, after an hour or 
> more of running, the system just went completely idle even though the 
> build is nowhere near complete.
> 
> Looking around, I found three cc1 processes (only 3, not 4).  One of 
> them was in a 'vnode' state while the other two are in 'vnlock'.  Both 
> my /usr/src and /usr/obj are null-mounts
> 
> 	/dev/wd0a on / type ffs (NFS exported, local)
> 	/dev/wd0e on /var type ffs (local)
> 	/dev/wd0g on /home type ffs (NFS exported, local)
> >>>	/dev/wd0h on /build type ffs (soft dependencies, NFS exported, local)
> 	/dev/wd1a on /amanda type ffs (local)
> >>>	/build/src on /usr/src type null (local)
> >>>	/build/obj on /usr/obj type null (local)
> 	/build/xsrc on /usr/xsrc type null (local)
> 	/build/pkgsrc on /usr/pkgsrc type null (local)
> 	kernfs on /kern type kernfs (local)
> 	ptyfs on /dev/pts type ptyfs (local)
> 	tmpfs on /tmp type tmpfs (local)
> 
> Is there some sort of locking race condition that I've managed to hit? 
> If so, is there a way to avoid it?

I think it's a lock order reversal, and it occurs with nullfs. I can
reproduce it on a test machine here and am looking into it.

Thanks,
Andrew