Subject: kern/4965: lfs_bmapv and lfs_markv lossage
To: None <gnats-bugs@gnats.netbsd.org, perseant@hhhh.org>
From: None <perseant@hhhh.org>
List: netbsd-bugs
Date: 02/09/1998 19:50:52
>Number:         4965
>Category:       kern
>Synopsis:       vinvalbuf panics from within lfs_markv, and ufs_lock from lfs_bmapv
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb  9 20:05:02 1998
>Last-Modified:
>Originator:     Konrad Schroder
>Organization:
-------------------------------------------------------------------------
Konrad Schroder           http://www.hitl.washington.edu/people/perseant/
System Administrator                         perseant@hitl.washington.edu
Human Interface Technology Lab                      Voice: (206) 616-1478
Box 352142, University of Washington, 98195, USA      FAX: (206) 543-5380

>Release:        1.3 (+ patches for kern/4641 and kern/4812)
>Environment:
System: NetBSD inle 1.3 NetBSD 1.3 (INLE) #77: Sun Feb 8 18:49:33 PST 1998 perseant@inle:/usr/src/sys/arch/i386/compile/INLE i386

>Description:
    vclean() sets VXLOCK, then calls vinvalbuf.

    vinvalbuf calls VOP_FSYNC, expecting that this will clear out all of
    the dirty bufs associated with the vnode...in the LFS, this
    eventually calls lfs_segwrite.

    lfs_segwrite in turn calls vget() on all dirty nodes---unless
    they're VXLOCKed---including the one that we're trying to
    invalidate, and flushes them all to disk.  Thinking its job is done,
    it returns.

    Then vinvalbuf panics because not all buffers got flushed from the
    locked node, because, er, it was locked.

>How-To-Repeat:
    Try using an LFS for a while with DIAGNOSTIC on, or...
        
    [Create a 96 Mb LFS partition on sd1c -- I use a Zip disk]
    newlfs -L /dev/rsd1c
    mount -t lfs -o-n /dev/sd1c /lfs
    (cd /usr/src; tar cf - gnu) | (cd /lfs; tar xf -)
    [Cause a segment to be cleaned -- I've adapted lfs_cleanerd:
    lfs_cleanerd -a /lfs ]
    
>Fix:
    This solution involves marking the vnode with a special flag,
    VLFSFLUSH, so that the LFS callback code ignores (well, works
    around) the VXLOCK when flushing that particular vnode to disk.
    When a check for VLFSFLUSH is added to lfs_vref() it also has to be
    added to lfs_vunref(), in order not to vrele() too many times.

    [Would it be useful to have an extra field in the vnode for
    fs-specific flags like this (and VDIROP), even if it is not known
    that an inode or equivalent exists?]

*** ./sys/vnode.h	Mon Feb  9 18:27:09 1998
--- ./sys/vnode.h.fixed	Sat Jan 31 00:00:27 1998
***************
*** 121,124 ****
--- 121,125 ----
  #define VALIASED        0x0800  /* vnode has an alias */
  #define VDIROP          0x1000  /* LFS: vnode is involved in a directory op */
+ #define VLFSFLUSH       0x2000  /* LFS: vnode is being flushed */

  /*
*** ./ufs/lfs/lfs_segment.c	Tue Jan 20 19:09:18 1998
--- ./ufs/lfs/lfs_segment.c.fixed.2	Sun Feb  8 18:42:23 1998
***************
*** 120,127 ****
  	struct lfs *fs;
  	struct segment *sp;
  
  	fs = VFSTOUFS(vp->v_mount)->um_lfs;
! 	if (fs->lfs_nactive > MAX_ACTIVE)
! 		return(lfs_segwrite(vp->v_mount, SEGM_SYNC|SEGM_CKP));
  	lfs_seglock(fs, SEGM_SYNC);
  	sp = fs->lfs_sp;
--- 120,134 ----
  	struct lfs *fs;
  	struct segment *sp;
+         int tmp;
+ 
+         /* Protect against VXLOCK deadlock in vinvalbuf() */
+         vp->v_flag |= VLFSFLUSH;
  
  	fs = VFSTOUFS(vp->v_mount)->um_lfs;
! 	if (fs->lfs_nactive > MAX_ACTIVE) {
!             tmp = lfs_segwrite(vp->v_mount, SEGM_SYNC|SEGM_CKP);
!             vp->v_flag &= ~VLFSFLUSH;
!             return tmp;
!         }
  	lfs_seglock(fs, SEGM_SYNC);
  	sp = fs->lfs_sp;
***************
*** 148,151 ****
--- 155,159 ----
  #endif
  	lfs_segunlock(fs);
+         vp->v_flag &= ~VLFSFLUSH;
  	return (0);
  }
***************
*** 1096,1100 ****
  	register struct vnode *vp;
  {
! 
  	if (vp->v_flag & VXLOCK)
  		return(1);
--- 1104,1116 ----
  	register struct vnode *vp;
  {
!         /*
!          * If we return 1 here, we risk vinvalbuf() not being able to
!          * flush all of the pages from this vnode, which will cause
!          * a panic during lfs_markv.  So, return 0 if that is in progress.
!          */
!         if (vp->v_flag & VLFSFLUSH) {
!                 vp->v_usecount++;
!                 return(0);
!         }
  	if (vp->v_flag & VXLOCK)
  		return(1);
***************
*** 1108,1111 ****
--- 1124,1135 ----
  	extern int lfs_no_inactive;
  
+         /*
+          * We also don't want to vrele() here since that will be done
+          * again later, causing us serious problems.
+          */
+         if(vp->v_flag & VLFSFLUSH) {
+             vp->v_usecount--;
+             return;
+         }
  	/*
  	 * This is vrele except that we do not want to VOP_INACTIVE

>Audit-Trail:
>Unformatted: