Subject: kern/777: union and null mounts don't play together like nice children
To: None <gnats-admin@NetBSD.ORG>
From: John Kohl <jtk@kolvir.blrc.ma.us>
List: netbsd-bugs
Date: 02/02/1995 19:20:08
>Number:         777
>Category:       kern
>Synopsis:       union and null mounts can deadlock/panic on locking protocols
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb  2 19:20:04 1995
>Originator:     John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release:        1.0A
>Environment:
	
System: NetBSD kolvir 1.0A NetBSD 1.0A (KOLVIR) #23: Sun Jan 29 22:20:00 EST 1995 jtk@kolvir:/u1/NetBSD-current/src/sys/arch/i386/compile/KOLVIR i386

>Description:

If you have a null mount to a directory tree which is also part of a
union mount, you are prone to deadlock and/or panic (depending on
whether you use a DIAGNOSTIC kernel).

I was trying to use the 'null' mount to get to the pre-union directory
tree, and the union mount to get to the (obviously) post-union directory
tree.  The relevant mounts were:

/dev/sd0f on /u1 type ufs (local)
/dev/wd0e on /u2 type ufs (local)
/u1/NetBSD-current on /u1/NetBSD-1.0A type null (local)
<below>:/u2 on /u1/NetBSD-current type union

The problem comes when the union file system needs to allocate a new
node.  It calls to
union_allocvp()->getnewvnode()->vgone()->vclean()->VOP_LOCK().
VOP_LOCK() in this case is null_bypass(), which passes it along to
ufs_lock() on the underlying vnode.

But!  the underlying UFS vnode is the vnode from the upper union layer,
which is already locked.  This results in a "panic: locking against
myself" on a DIAGNOSTIC kernel, or a process-internal deadlock on a
non-DIAGNOSTIC kernel.  Bad News.

>How-To-Repeat:
	Make a mount tree like the above, reference stuff through the
null tree, and then through the union mount.  You may get very unlucky.

>Fix:

I'm not sure whether the blame is properly placed with the union or null
FS or the operator (for trying to be too clever).  I don't think there's
any simple solution to this locking violation.

>Audit-Trail:
>Unformatted: