Subject: kern/777: union and null mounts don't play together like nice children
To: None <gnats-admin@NetBSD.ORG>
From: John Kohl <jtk@kolvir.blrc.ma.us>
List: netbsd-bugs
Date: 02/02/1995 19:20:08
>Number: 777
>Category: kern
>Synopsis: union and null mounts can deadlock/panic on locking protocols
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Feb 2 19:20:04 1995
>Originator: John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release: 1.0A
>Environment:
System: NetBSD kolvir 1.0A NetBSD 1.0A (KOLVIR) #23: Sun Jan 29 22:20:00 EST 1995 jtk@kolvir:/u1/NetBSD-current/src/sys/arch/i386/compile/KOLVIR i386
>Description:
If you have a null mount to a directory tree which is also part of a
union mount, you are prone to deadlock and/or panic (depending on
whether you use a DIAGNOSTIC kernel).
I was trying to use the 'null' mount to get to the pre-union directory
tree, and the union mount to get to the (obviously) post-union directory
tree. The relevant mounts were:
/dev/sd0f on /u1 type ufs (local)
/dev/wd0e on /u2 type ufs (local)
/u1/NetBSD-current on /u1/NetBSD-1.0A type null (local)
<below>:/u2 on /u1/NetBSD-current type union
The problem comes when the union file system needs to allocate a new
node. It calls to
union_allocvp()->getnewvnode()->vgone()->vclean()->VOP_LOCK().
VOP_LOCK() in this case is null_bypass(), which passes it along to
ufs_lock() on the underlying vnode.
But! the underlying UFS vnode is the vnode from the upper union layer,
which is already locked. This results in a "panic: locking against
myself" on a DIAGNOSTIC kernel, or a process-internal deadlock on a
non-DIAGNOSTIC kernel. Bad News.
>How-To-Repeat:
Make a mount tree like the above, reference stuff through the
null tree, and then through the union mount. You may get very unlucky.
>Fix:
I'm not sure whether the blame is properly placed with the union or null
FS or the operator (for trying to be too clever). I don't think there's
any simple solution to this locking violation.
>Audit-Trail:
>Unformatted: