netbsd-bugs: Re: kern/36331: MP deadlock resembling ufs

Subject: Re: kern/36331: MP deadlock resembling ufs_hashlock deadlock
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Andrew Doran <ad@netbsd.org>
List: netbsd-bugs
Date: 05/17/2007 17:05:05

The following reply was made to PR kern/36331; it has been noted by GNATS.

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: Sverre Froyen <sverre@viewmark.com>
Subject: Re: kern/36331: MP deadlock resembling ufs_hashlock deadlock
Date: Thu, 17 May 2007 18:00:47 +0100

 On Mon, May 14, 2007 at 09:05:00PM +0000, perseant@netbsd.org wrote:

 > I've been chasing down what I had thought was an LFS bug until I isolated
 > it last night, and it looks more like a multiprocessing locking bug to me.
 > Any thoughts on what is going on here would be helpful.

 On the code itself: there are two locks around the inode hash. ufs_hashlock
 serializes insertion of inodes into the hash (locks the code path).
 ufs_ihash_lock is taken by readers to prevent the hash from changing (locks
 the data structure). One thread is trying to lock in this order:

 	ufs_hashlock -> vnode lock

 The other thread is I think acquring them in the opposite order. There is
 contention and so they deadlock waiting for each other to release a lock.

 > vget(cf50a014,10002,94,0,c03c1250) at netbsd:vget+0x74
 >      ^^^^^^^^
 > ufs_ihashget(4,2,0,2,c03c8734) at netbsd:ufs_ihashget+0x94
 > ffs_vget()

 I think the solution is to never lock the vnode with ufs_hashlock held. One
 way is to change ufs_ihashget so that if the 'flags' argument is zero (not
 LK_EXCLUSIVE or something else), it should not try to lock the vnode. It
 should just drop ufs_ihash_lock and return the vnode address to indicate
 whether or not there is now an entry in the hash. Then we could change
 ffs_vget to look like this:

  retry:
 	/* If it's already in the hash, we win. */
  	if ((*vpp = ufs_ihashget(dev, ino, LK_EXCLUSIVE)) != NULL)
  		return (0);

  	/* Allocate a new vnode/inode. */
  	if ((error = getnewvnode(VT_UFS, mp, ffs_vnodeop_p, &vp)) != 0) {
  		*vpp = NULL;
  		return (error);
  	}
  	ip = pool_get(&ffs_inode_pool, PR_WAITOK);

  	/*
  	 * If someone beat us to it, put back the freshly allocated
 	 * vnode/inode pair and retry.
  	 */
  	mutex_enter(&ufs_hashlock);
  	if (ufs_ihashget(dev, ino, 0) != NULL) {
  		mutex_exit(&ufs_hashlock);
 		ungetnewvnode(vp);
 		pool_put(&ffs_inode_pool, ip);
 		goto retry;
  	}

 I can see about making this change but I don't have full Internet access
 at the moment. It will be at least a few days before that changes.

 Andrew