Subject: vclean deadlock risk
To: None <tech-kern@NetBSD.ORG>
From: Paul Kranenburg <pk@cs.few.eur.nl>
List: tech-kern
Date: 05/09/1998 15:25:11
I want to point out a possible deadlock scenario that exists since
the lite2 integration.

Observations:

	(1) vrele() now calls vn_lock() before calling VOP_INACTIVE()
	    if the last vnode reference goes away.

	(2) vn_lock() sleeps unconditionally if VXLOCK is set.

	(3) vclean() sets VXLOCK and also adds and removes a reference
	    (using vref() and vrele()) to the vnode while it is being
	    cleaned. It only does the latter if other references still
	    exist.

The deadlock is triggered if the vnode looses the other references
while vclean() is working on it: vclean() calls vrele() which makes
the last reference (held by vclean()) go away and the process gets
stuck in vn_lock() because VXLOCK is still set.

The best way to make this happen is to find a vnode whose VOP_CLOSE()
will sleep (e.g. a tty that needs flushing), giving other processes
an opportunity to dispose of their references to the same vnode
in between. Doing a system shutdown is a good real-world example
of such a scenario.

I don't have an obvious fix, but I want to note that it seems inappropriate
to have vrele() do a VOP_INACTIVE() after the vnode has already be
inactivated _and_ reclaimed from within vclean(). Maybe we should
consider inlining the relevant parts of vrele() at this point.

-pk