Subject: kern/796: another union fs locking violation
To: None <gnats-admin@NetBSD.ORG>
From: John Kohl <jtk@kolvir.blrc.ma.us>
List: netbsd-bugs
Date: 02/11/1995 15:50:04
>Number:         796
>Category:       kern
>Synopsis:       two processes can race and deadlock in the union fs code
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 11 15:50:03 1995
>Originator:     John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release:        1.0-current
>Environment:
	
System: NetBSD kolvir 1.0A NetBSD 1.0A (KOLVIR) #31: Sat Feb 11 13:13:13 EST 1995 jtk@kolvir:/u1/NetBSD-current/src/sys/arch/i386/compile/KOLVIR i386

>Description:
I have two processes deadlocked waiting for the other to release a vnode lock.

Both processes are trying to allocate a cover vnode for the same
underlying directory.

Process A's stack looks like:

ufs_lock(topvnode)
vget(topvnode)
ufs_lookup()
union_lookup1()
union_lookup()		looking up ../../../../arch/sys/param.h
					    ^ parse ptr here
lookup()
namei()
...

Process B's stack looks like:
sleep()
union_allocvp()
union_lookup()		looking up /u1/NetBSD-current/src/sys/arch/i386
							      ^ parse ptr here
lookup()
namei()
...

Process B has the upper vnodes for both /u1/NetBSD-current/src/sys and
/u1/NetBSD-current/src/sys/arch locked.  It looks like it had just
looked up "arch" in sys and acquired it locked [via VOP_LOOKUP() in
union_lookup1()], and is now preparing to union-ize it, and waits on the
union node lock.

Process A has the union version of "arch" locked (since it's in the
lookup routine), and is trying to look up `..' in `arch', using
VOP_LOOKUP() on the upper layer.  It found `..' in the name cache, and
tried to do a vget() to lock it and claim it from any free list it might
be on.  vget tries to lock the UFS vnode.

>How-To-Repeat:
Run emacs and a kernel build at the same time.

>Fix:
You must always take union node/underlying node locks in the same order,
even when constructing such nodes.  The name cache is getting in the
way.

JSP, any ideas?
>Audit-Trail:
>Unformatted: