netbsd-bugs: Re: kern/36608: LFS related panic with LOCKDEBUG

Subject: Re: kern/36608: LFS related panic with LOCKDEBUG
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Sverre Froyen <sverre@viewmark.com>
List: netbsd-bugs
Date: 07/30/2007 23:05:05

The following reply was made to PR kern/36608; it has been noted by GNATS.

From: Sverre Froyen <sverre@viewmark.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/36608: LFS related panic with LOCKDEBUG
Date: Mon, 30 Jul 2007 16:03:04 -0600

 In lfs_vnops.c there is a comment about genfs_putpages stating:
 
  * (2) It needs to explicitly send blocks to be written when it is done.
  *     If VOP_PUTPAGES is called without the seglock held, we simply take
  *     the seglock and let lfs_segunlock wait for us.
  *     XXX There might be a bad situation if we have to flush a vnode while
  *     XXX lfs_markv is in operation.  As of this writing we panic in this
  *     XXX case.
 
 I have done a litle more investigation and I find that I consistently get a 
 double lock panic on the vnode(?) that is locked immediately before the call 
 to lfs_segunlock, around line 2290 in lfs_vnops.c:
 
                         simple_unlock(&vp->v_interlock);
 
                         simple_lock(&vp->v_interlock);
                         write_and_wait(fs, vp, busypg, seglocked, NULL);
 *** vp is locked at this point
                         if (!seglocked) {
                                 lfs_release_finfo(fs);
                                 lfs_segunlock(fs);
 *** I get the panic before the call to lfs_segunlock returns
                         }
                         sp->vp = NULL;
                         goto get_seglock;
 
 It looks like lfs_segunlock is sleeping in the second while loop in this code 
 snippet from lfs_subr.c:
 
                 simple_lock(&fs->lfs_interlock);
                 while (ckp && sync && fs->lfs_iocount)
                         (void)ltsleep(&fs->lfs_iocount, PRIBIO + 1,
                                       "lfs_iocount", 0, &fs->lfs_interlock);
                 while (sync && sp->seg_iocount) {
                         (void)ltsleep(&sp->seg_iocount, PRIBIO + 1,
                                      "seg_iocount", 0, &fs->lfs_interlock);
                         DLOG((DLOG_SEG, "sleeping on iocount %x == %d\n", sp, 
 sp
 ->seg_iocount));
                 }
                 simple_unlock(&fs->lfs_interlock);
 
 I do not know if the comment above refers to the case I'm seeing or not, but 
 while lfs_segunlock is sleeping some other code comes along and attempts to 
 lock the vnode that was locked in genfs_putpages.