netbsd-bugs: kern/13194: PR_WAITOK within LOCK: pool_get(..., PR_WAITOK) in syssrc/uvm/uvm_aobj.c:uao_find_swhash

Subject: kern/13194: PR_WAITOK within LOCK: pool_get(..., PR_WAITOK) in syssrc/uvm/uvm_aobj.c:uao_find_swhash_elt
To: None <netbsd-bugs@netbsd.org, smd@ebone.net>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: netbsd-bugs
Date: 06/18/2001 13:19:16

Here's an analysis of the bug.  chuq will be looking into fixing it.

Jason and I added some asserts to the pool code to catch attempts to
invoke pool_get(..., PR_WAITOK) while the cpu was holding a spinlock.
(sleeping with spinlocks is disallowed, because it leads to deadlock).

This turned up this potential pagedaemon deadlock, which looks like a
mess to fix..

In this case, the pageout daemon calls pool_get(..., PR_WAITOK).  

The pageout daemon is likely to be one of the few things which is
freeing up pages under load, which means that if the pool ever runs
out of memory, the system is probably going to deadlock shortly
anyway.

So, crawling from the bottom up:

uao_find_swhash_elt() does not provide any way to return failure in
the (create != 0) case; it calls pool_get(&uao_swhash_elt_pool,
PR_WAITOK) indicating that it wants to wait until hell freezes over
for free memory.

uao_find_swhash_elt() is only called with a nonzero "create" parameter
from uao_set_swslot(); similarly, it does not provide any way to
return failure in the slot != 0 case (i.e., the create != 0 case which
will want to allocate something).

uao_set_swslot is called with a nonzero `slot' value in several
places:

	uao_get(), in the i/o error case (setting SWSLOT_BAD)
	uvm_pager_put(), in an i/o error case.
	uvmpd_scan_inactive(), when allocating swap-backed pages to an
		anonymous object.

It looks like fixing this will be messy.

 1) uao_find_swhash_elt() will need to be change so it can return NULL
in the "create" case.

 2) uao_set_swslot will need to be changed to return a failure
indication when it couldn't set the slot.

 3) error handling in callers of uao_set_swslot:
	a) uao_get(): failure here is unlikely to occur, as the object
	presumably already had the object allocated to it holding the
	swslot value.  there might be race conditions which deallocate
	the swap page, though..

	b) uvm_pager_put():
	this only happens in the synchronous i/o case, in the
	"dropcluster" code.  i'm not sure how to handle a transient
	uao_set_swslot failure here.

	c) uvmpd_scan_inactive():
	the cleanup case would be to unbusy the page, back it out of
		the cluster, and skip it..