Subject: the NFS removeit/sillyrename crash
To: None <firstname.lastname@example.org>
From: Matthias Drochner <M.Drochner@fz-juelich.de>
Date: 04/16/2003 22:52:24
Now that I got sick of the box dropping into DDB during reboot almost
everytime after I did a system build, I looked into the issue.
The problem is that the directory vnode used by nfs_removeit() is not
necessarily valid anymore.
If the directory vnode got cleaned up between the sillyrename() and the
inactive(), one gets a panic.
This is a problem in 2 cases:
-on heavy system load (PR kern/9491) -- I couldn't reproduce this case,
but it looks obvious
-during shutdown (PR kern/9326, kern/11284) -- vflush() just works through
a list of associated vnodes, and it happens that a directory vnode is
vgone'd before the sillyrename'd files within it
So there are 3 approaches coming to my mind:
a) Don't refer to the directory vnode for sillyrenames - just save the
file handle and necessary information as eg v2/v3. This would require
to implement the "remove" RPC call especially for this purpose.
b) Rethink the locking. Add a reference to the directory vnode if a
"sillyrename" occurs, and hack some more code to obey this at vflush()
time. Forcing all dirs where sillyrenames occured to occupy vnodes
c) In nfs_inactive, if it is a directory, check whether there are
sillyrenames pending on it and process them first. This needs more
considerations because the sillyrenamed vnode might be still in use...
Atm, only (a) looks viable for me.
Any ideas for this?