Subject: kern/1524: union_dircache has race that can give duplicate dircache to two union nodes
To: None <gnats-bugs@gnats.netbsd.org>
From: John Kohl <jtk@kolvir.arlington.ma.us>
List: netbsd-bugs
Date: 09/26/1995 22:03:07
>Number:         1524
>Category:       kern
>Synopsis:       union_dircache has race that can give duplicate dircache to two union nodes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 26 22:20:02 1995
>Last-Modified:
>Originator:     John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release:        NetBSD-current, 1995/09/24
>Environment:
	
System: NetBSD pattern 1.0A NetBSD 1.0A (PATTERN) #131: Mon Sep 25 23:51:33 EDT 1995 jtk@pattern:/u1/NetBSD-current/src/sys/arch/i386/compile/PATTERN i386


>Description:
	It appears that union_dircache() can host a race condition that
lets two different processes reading the same directory share the same
dircache stack, which wreaks havoc when the second one tries to release
and free up the vnode stack.
>How-To-Repeat:
	I think you have to have process A get into union_dircache()
while some other process (C) has the first underlying directory locked.  A
then blocks at the VOP_LOCK() call in union_dircache().
Process B then enters union_dircache(), and finds dircache attached to
the vnode, so it happily takes it and searches for the next VP to use.
It also blocks on the VOP_LOCK().  Process C then unlocks the vnode, and
both A & B believe they've got their own dircache stack, and they
each proceed to allocate a new union vnode for their readdir processing.

Eventually, both special vnodes get reclaimed and the second one tries
to union_diruncache() all its underlying nodes and trips.

You then get a panic of this variety:
kernel: page fault trap, code=0
Stopped at      _vrele+0x18:    movw    0x4(%edx),%ax
db> tr
_vrele(deadbeef) at _vrele+0x18
_union_diruncache(f8837300) at _union_diruncache+0x1f
_union_inactive(f9e2de78,0,f81d1694,f87b4f00,f9e2deac) at _union_inactive+0x14
_vrele(f87b4f00,f87cbf00,f88327c0,0,f81d130c) at _vrele+0x82
_vn_close(f87b4f00,5,f8822e80,f87cbf00,f9e2df10) at _vn_close+0x4d
_vn_closefile(f88327c0,f87cbf00,f8834500,f8834570,0) at _vn_closefile+0x19
_closef(f88327c0,f87cbf00,f7bf8774,f87cbf00,4) at _closef+0x136
_fdrelease(f87cbf00,0,f9e2dfa4,f8195905,f87cbf00) at _fdrelease+0x8d
_close(f87cbf00,f9e2df84,f9e2df7c,0,73908) at _close+0x1a
_syscall() at _syscall+0x239
--- syscall (number 6) ---
0x1009ac2f:


>Fix:
	4.4BSD-Lite Release 2 locks the top-level vnode (argument VP)
during the entire body of union_dircache().  This should do the trick:

===================================================================
RCS file: RCS/union_subr.c,v
retrieving revision 1.1.1.1
diff -c -r1.1.1.1 union_subr.c
*** union_subr.c	1995/06/29 02:09:34	1.1.1.1
--- sys/miscfs/union/union_subr.c	1995/09/27 01:17:33
***************
*** 1040,1051 ****
  	struct vnode *vp;
  {
  	int cnt;
! 	struct vnode *nvp;
  	struct vnode **vpp;
! 	struct vnode **dircache = VTOUNION(vp)->un_dircache;
! 	struct union_node *un;
  	int error;
  
  	if (dircache == 0) {
  		cnt = 0;
  		union_dircache_r(vp, 0, &cnt);
--- 1040,1052 ----
  	struct vnode *vp;
  {
  	int cnt;
! 	struct vnode *nvp = NULLVP;
  	struct vnode **vpp;
! 	struct vnode **dircache;
  	int error;
  
+ 	VOP_LOCK(vp);
+ 	dircache = VTOUNION(vp)->un_dircache;
  	if (dircache == 0) {
  		cnt = 0;
  		union_dircache_r(vp, 0, &cnt);
***************
*** 1067,1082 ****
  	}
  
  	if (*vpp == NULLVP)
! 		return (NULLVP);
  
  	VOP_LOCK(*vpp);
  	VREF(*vpp);
  	error = union_allocvp(&nvp, vp->v_mount, NULLVP, NULLVP, 0, *vpp, NULLVP, 0);
! 	if (error)
! 		return (NULLVP);
! 	VTOUNION(vp)->un_dircache = 0;
! 	VTOUNION(nvp)->un_dircache = dircache;
! 
  	return (nvp);
  }
  
--- 1068,1084 ----
  	}
  
  	if (*vpp == NULLVP)
! 		goto out;
  
  	VOP_LOCK(*vpp);
  	VREF(*vpp);
  	error = union_allocvp(&nvp, vp->v_mount, NULLVP, NULLVP, 0, *vpp, NULLVP, 0);
! 	if (!error) {
! 		VTOUNION(vp)->un_dircache = 0;
! 		VTOUNION(nvp)->un_dircache = dircache;
! 	}
! out:
! 	VOP_UNLOCK(vp);
  	return (nvp);
  }
  

>Audit-Trail:
>Unformatted: