Subject: Re: the NFS removeit/sillyrename crash
To: Matthias Drochner <M.Drochner@fz-juelich.de>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 04/22/2003 09:24:31
On Wed, 16 Apr 2003, Matthias Drochner wrote:
>
> Now that I got sick of the box dropping into DDB during reboot almost
> everytime after I did a system build, I looked into the issue.
>
> The problem is that the directory vnode used by nfs_removeit() is not
> necessarily valid anymore.
> If the directory vnode got cleaned up between the sillyrename() and the
> inactive(), one gets a panic.
>
> This is a problem in 2 cases:
> -on heavy system load (PR kern/9491) -- I couldn't reproduce this case,
> but it looks obvious
> -during shutdown (PR kern/9326, kern/11284) -- vflush() just works through
> a list of associated vnodes, and it happens that a directory vnode is
> vgone'd before the sillyrename'd files within it
>
> So there are 3 approaches coming to my mind:
> a) Don't refer to the directory vnode for sillyrenames - just save the
> file handle and necessary information as eg v2/v3. This would require
> to implement the "remove" RPC call especially for this purpose.
> b) Rethink the locking. Add a reference to the directory vnode if a
> "sillyrename" occurs, and hack some more code to obey this at vflush()
> time. Forcing all dirs where sillyrenames occured to occupy vnodes
> is nasty...
> c) In nfs_inactive, if it is a directory, check whether there are
> sillyrenames pending on it and process them first. This needs more
> considerations because the sillyrenamed vnode might be still in use...
>
> Atm, only (a) looks viable for me.
(a) looks like the best.
Thanks for looking into this.
Take care,
Bill