Subject: Re: the NFS removeit/sillyrename crash
To: Matthias Drochner <M.Drochner@fz-juelich.de>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 04/22/2003 09:24:31
On Wed, 16 Apr 2003, Matthias Drochner wrote:

>
> Now that I got sick of the box dropping into DDB during reboot almost
> everytime after I did a system build, I looked into the issue.
>
> The problem is that the directory vnode used by nfs_removeit() is not
> necessarily valid anymore.
> If the directory vnode got cleaned up between the sillyrename() and the
> inactive(), one gets a panic.
>
> This is a problem in 2 cases:
> -on heavy system load (PR kern/9491) -- I couldn't reproduce this case,
>  but it looks obvious
> -during shutdown (PR kern/9326, kern/11284) -- vflush() just works through
>  a list of associated vnodes, and it happens that a directory vnode is
>  vgone'd before the sillyrename'd files within it
>
> So there are 3 approaches coming to my mind:
> a) Don't refer to the directory vnode for sillyrenames - just save the
>    file handle and necessary information as eg v2/v3. This would require
>    to implement the "remove" RPC call especially for this purpose.
> b) Rethink the locking. Add a reference to the directory vnode if a
>    "sillyrename" occurs, and hack some more code to obey this at vflush()
>    time. Forcing all dirs where sillyrenames occured to occupy vnodes
>    is nasty...
> c) In nfs_inactive, if it is a directory, check whether there are
>    sillyrenames pending on it and process them first. This needs more
>    considerations because the sillyrenamed vnode might be still in use...
>
> Atm, only (a) looks viable for me.

(a) looks like the best.

Thanks for looking into this.

Take care,

Bill