Subject: the NFS removeit/sillyrename crash
To: None <tech-kern@netbsd.org>
From: Matthias Drochner <M.Drochner@fz-juelich.de>
List: tech-kern
Date: 04/16/2003 22:52:24
Now that I got sick of the box dropping into DDB during reboot almost
everytime after I did a system build, I looked into the issue.

The problem is that the directory vnode used by nfs_removeit() is not
necessarily valid anymore.
If the directory vnode got cleaned up between the sillyrename() and the
inactive(), one gets a panic.

This is a problem in 2 cases:
-on heavy system load (PR kern/9491) -- I couldn't reproduce this case,
 but it looks obvious
-during shutdown (PR kern/9326, kern/11284) -- vflush() just works through
 a list of associated vnodes, and it happens that a directory vnode is
 vgone'd before the sillyrename'd files within it

So there are 3 approaches coming to my mind:
a) Don't refer to the directory vnode for sillyrenames - just save the
   file handle and necessary information as eg v2/v3. This would require
   to implement the "remove" RPC call especially for this purpose.
b) Rethink the locking. Add a reference to the directory vnode if a
   "sillyrename" occurs, and hack some more code to obey this at vflush()
   time. Forcing all dirs where sillyrenames occured to occupy vnodes
   is nasty...
c) In nfs_inactive, if it is a directory, check whether there are
   sillyrenames pending on it and process them first. This needs more
   considerations because the sillyrenamed vnode might be still in use...

Atm, only (a) looks viable for me.

Any ideas for this?

best regards
Matthias