NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race condition not handled
The following reply was made to PR kern/46990; it has been noted by GNATS.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race
condition not handled
Date: Fri, 21 Sep 2012 15:40:49 +0200
Hi,
I've still not added this code fragment bacause I've just shut down my
test system for weekend.
But this fix is most like the same as I did - except that your's will
not panic if KASSERT is enabled as my version did ...
I've added the printf if I detect dnode == NULL at the begin of the loop
and I got it, as I mentioned before.
I think this patch will solve the problem, but I will retest on monday.
till then
W. Stukenbrock
Taylor R Campbell wrote:
> Date: Fri, 21 Sep 2012 13:10:26 +0200
> From: Wolfgang Stukenbrock
> <wolfgang.stukenbrock%nagler-company.com@localhost>
>
> If we can exclude the first loop pass, than the "problem" must happen
> while walking up the parent chain.
> We will vput(vp) after getting tn_vlock on the parent - so we have no
> lock on any vnode anymore.
> The normal way is to get tn_vlock and than seaches for the vnode by
> calling tmpfs_vnode_get(). There we have two different ways to handle
> the locks:
> ...
>
> It looks to me as though this is the source of the problem. I didn't
> look closely enough at tmpfs_vnode_get before, but now that I look at
> it, it seems that the only invariant it preserves is the existence of
> the tmpfs node -- that is, tmpfs_reclaim will not free the tmpfs node
> while we are getting a vnode for it. All other invariants may be lost
> because tmpfs_vnode_get drops all locks except the tmpfs reclaiming
> bit.
>
> In that case, it is necessary to check tmpfs_rmdired_p after
> tmpfs_vnode_get, so try adding the following fragment at the very end
> of the loop, after `error = tmpfs_vnode_get(mp, dnode, &vp); if
> (error) return error;':
>
> /*
> * tmpfs_vnode_get only guarantees that dnode will not
> * be freed while we get a vnode for it. It does not
> * preserve any other invariants, so we must check
> * whether the parent has been removed in the meantime.
> */
> if (tmpfs_rmdired_p(vp)) {
> printf("tmpfs_vnode_get lost race with rmdir\n");
> vput(vp);
> return ENOENT;
> }
>
> Can you try this and see how dirconc goes? In particular, please let
> me know if you hit the printf.
>
> What will happen if I've a reference to a directory (e.g. current-dir)
> and someone delete it and the parent of it too?
>
> Then you'll have a reference to the vnode for a deleted directory.
> But if you have the vnode locked, then nothing can delete it.
>
> I've seen no code that will destroy the link between the vnode and the
> node structure till now, but I assume it is present somewhere.
>
> This happens in tmpfs_reclaim, when the vnode ceases to be referenced.
>
> As a second action all WHITEOUT children of the just removed directory
> are simply deleted during directory delete operation. (calls to
> tmpfs_dir_detach and tmpfs_free_dirent without any checks for vnode
> references.)
> If there are still vnodes associated to the nodes in the dirents, is
> this code save?
>
> WHITEOUT stuff is not relevant unless you are using onionfs, so don't
> worry about that logic.
>
> Just as information:
> The last problem I've fixed before was the sync with the cache after
> rename occured.
> In Head the call to genfs_rename_cache_purge() was commented out in
> tmpfs_gro_rename.
> 5.1 seems to need it, because I've ended up with having the object in
> the correct tmpfs-structure, but when accessing it in the cache it is
> not found.
> Symtom: "ls <dir>" will show the name but "ls -l <dir>" say "file not
> found".
>
> This is not surprising. It's probably safe to uncomment all of the
> commented cache purges that I left scattered around -- the only reason
> I didn't do that was to preserve the existing behaviour and to remind
> me that the namecache needs careful analysis which I haven't done.
> But as a stop-gap before that analysis is done, overzealously purging
> the cache may improve correctness and will probably at worst only hurt
> performance.
>
Home |
Main Index |
Thread Index |
Old Index