Subject: kern/2258: Mounting nullfs over NFS repeatably panics vnode_pager_uncache
To: None <gnats-bugs@NetBSD.ORG>
From: None <jonathan@DSG.Stanford.EDU>
List: netbsd-bugs
Date: 03/24/1996 18:28:39
>Number:         2258
>Category:       kern
>Synopsis:       Mounting nullfs over NFS repeatably panics vnode_pager_uncache
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 24 21:50:01 1996
>Last-Modified:
>Originator:     Jonathan Stone
>Organization:
Stanford DSG
>Release:        1.1
>Environment:
System: NetBSD Bowl.DSG.Stanford.EDU 1.1B NetBSD 1.1B (DSG) #0: Sun Mar 17 20:44:01 PST 1996 jonathan@Bowl.DSG.Stanford.EDU:/usr/src/sys/arch/i386/compile/DSG i386


>Description:

vnode_pager_uncache() panics when unmounting null filesystems mounted
over an NFS filesystem, in a kernel where DEBUG is defined.
I see two problems with this:

	a.  The situation for which the DEBUG code is checking
	    should, perhaps (I don't know for sure) be handled
	    by the filesystem underlying the NULL filesystem,
	    not the NULL filesystem itself.  Certainly,
	    if NFS filesytems need to be special-cased, then
	    NULL-over-NFS filesystems seem to need to be
	    special-cased, too.

	b. panicking, in that situation, is appropriate for
	   a kernel configured with DIAGNOSTIC,, but not for a
	   kernel configured with  DEBUG.
	   "DEBUG isn't supposed to do that".

Due to my own local setup, this panic happens _every_ _time_ I reboot,
which is unpleasant, to say the least.



>How-To-Repeat:

Put NetBSD source on an NFS server.
Mount that NFS server, on a NetBSD client,  somewhere other
than /usr/src. 

Use nullfs to mount the source server under /usr/src, to get a
"correct" canonical pathname into the kernel ident string and
debugging source pathnames into binaries (e.g., for making
distributions or snapshots.

Build a kernel (or do anything that involves unlinking a file.
Reboot.  Note the panic and subsequent fscks.   This gets boring
very fast, especially when debugging kernels.


>Fix:

The following patch below has worked for me for months. (I thought
I send-pr'ed it in Feburary, but I can't find it in the database.)

The patch changes the panic() to a printf(), unless DIAGNOSTIC
is defined; and extends the special-case check for filesystems
that are "expected" to have unlocked vnodes, to include the null
filesystem.


*** vnode_pager.c.DIST	Sat Feb 10 04:35:05 1996
--- vnode_pager.c.dsg	Mon Feb 12 12:07:35 1996
***************
*** 464,470 ****
  	pager = (vm_pager_t)vp->v_vmdata;
  	if (pager == NULL)
  		return (TRUE);
! #ifdef DEBUG
  	if (!VOP_ISLOCKED(vp)) {
  #ifdef NFSCLIENT
  		extern int (**nfsv2_vnodeop_p) __P((void *));
--- 464,470 ----
  	pager = (vm_pager_t)vp->v_vmdata;
  	if (pager == NULL)
  		return (TRUE);
! #if defined(DIAGNOSTIC) || defined(DEBUG)
  	if (!VOP_ISLOCKED(vp)) {
  #ifdef NFSCLIENT
  		extern int (**nfsv2_vnodeop_p) __P((void *));
***************
*** 472,489 ****
  #ifdef FIFO
  		extern int (**fifo_nfsv2nodeop_p) __P((void *));
  #endif
  
  		if (vp->v_op != nfsv2_vnodeop_p
  		    && vp->v_op != spec_nfsv2nodeop_p
  #ifdef FIFO
  		    && vp->v_op != fifo_nfsv2nodeop_p
! #endif
  		    )
  
! #endif
  			panic("vnode_pager_uncache: vnode not locked!");
! 	}
  #endif
  	/*
  	 * Must use vm_object_lookup() as it actually removes
  	 * the object from the cache list.
--- 472,502 ----
  #ifdef FIFO
  		extern int (**fifo_nfsv2nodeop_p) __P((void *));
  #endif
+ #ifdef NULLFS
+ 		extern int (**null_vnodeop_p)();
+ #endif
  
  		if (vp->v_op != nfsv2_vnodeop_p
  		    && vp->v_op != spec_nfsv2nodeop_p
  #ifdef FIFO
  		    && vp->v_op != fifo_nfsv2nodeop_p
! #endif	/*FIFO*/
! #ifdef NULLFS
! 		    && vp->v_op != null_vnodeop_p	/*XXX too inclusive?*/
! #endif	/*NULLFS*/
  		    )
  
! #endif	/*NFSCLIENT*/
! 
! #ifdef DIAGNOSTIC
  			panic("vnode_pager_uncache: vnode not locked!");
! #else
! 			printf("vnode_pager_uncache: vnode not locked!");
  #endif
+ 	}
+ 
+ #endif	/* defined(DIAGNOSTIC) || defined(DEBUG) */
+ 
  	/*
  	 * Must use vm_object_lookup() as it actually removes
  	 * the object from the cache list.
>Audit-Trail:
>Unformatted: