Subject: Re: nfsd: locking botch in op %d
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Frank van der Linden <fvdl@wasabisystems.com>
List: tech-kern
Date: 03/12/2001 16:28:44
On Sat, Mar 10, 2001 at 01:34:32PM -0500, der Mouse wrote:
> Looking at the routines those file/line pairs indicate, it appears to
> me that what's happening is more or less likethis.
> 
> A LOOKUP happens on a device node.  ffs_vget is called, misses in the
> inode cache, and takes ufs_hashlock (the first B line, which is the
> do-while near the beginning of ffs_vget).  Everything proceeds
> normally, and it calls ufs_hashins() some 40 lines farther into
> ffs_vget; the unmatched D lock is taken in ufs_hashins(), and is on the
> affected inode's vnode.  ffs_vget then releases ufs_hashlock (the
> second B line).  Then another 35-40 lines farther down, ufs_vinit is
> called.  It promptly calls checkalias(), which finds the vnode aliased
> (the server apparently has a live vnode for its own sd0a, not
> surprising since its own root is on sd0a, and the client's sd0a is the
> same <major,minor> as the server's).  It appears that the lock
> ufs_hashins() took on the vnode gets lost somewhere in the checkalias()
> shuffle.

In the case of an aliased device node, ufs_vinit calls vput() on the
old vnode just before initializing the new one. vput() means unlock+deref.
This one doesn't seem to show up in your trace output, but the lockmgr()
call 2 lines below (still in ufs_vinit()) does. That's pretty much
impossible.

Are you sure that your trace takes the vput() into account?

- Frank