NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race condition not handled
The following reply was made to PR kern/46990; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race
condition not handled
Date: Fri, 21 Sep 2012 13:12:46 +0000
Date: Fri, 21 Sep 2012 13:10:26 +0200
From: Wolfgang Stukenbrock
<wolfgang.stukenbrock%nagler-company.com@localhost>
If we can exclude the first loop pass, than the "problem" must happen
while walking up the parent chain.
We will vput(vp) after getting tn_vlock on the parent - so we have no
lock on any vnode anymore.
The normal way is to get tn_vlock and than seaches for the vnode by
calling tmpfs_vnode_get(). There we have two different ways to handle
the locks:
...
It looks to me as though this is the source of the problem. I didn't
look closely enough at tmpfs_vnode_get before, but now that I look at
it, it seems that the only invariant it preserves is the existence of
the tmpfs node -- that is, tmpfs_reclaim will not free the tmpfs node
while we are getting a vnode for it. All other invariants may be lost
because tmpfs_vnode_get drops all locks except the tmpfs reclaiming
bit.
In that case, it is necessary to check tmpfs_rmdired_p after
tmpfs_vnode_get, so try adding the following fragment at the very end
of the loop, after `error = tmpfs_vnode_get(mp, dnode, &vp); if
(error) return error;':
/*
* tmpfs_vnode_get only guarantees that dnode will not
* be freed while we get a vnode for it. It does not
* preserve any other invariants, so we must check
* whether the parent has been removed in the meantime.
*/
if (tmpfs_rmdired_p(vp)) {
printf("tmpfs_vnode_get lost race with rmdir\n");
vput(vp);
return ENOENT;
}
Can you try this and see how dirconc goes? In particular, please let
me know if you hit the printf.
What will happen if I've a reference to a directory (e.g. current-dir)
and someone delete it and the parent of it too?
Then you'll have a reference to the vnode for a deleted directory.
But if you have the vnode locked, then nothing can delete it.
I've seen no code that will destroy the link between the vnode and the
node structure till now, but I assume it is present somewhere.
This happens in tmpfs_reclaim, when the vnode ceases to be referenced.
As a second action all WHITEOUT children of the just removed directory
are simply deleted during directory delete operation. (calls to
tmpfs_dir_detach and tmpfs_free_dirent without any checks for vnode
references.)
If there are still vnodes associated to the nodes in the dirents, is
this code save?
WHITEOUT stuff is not relevant unless you are using onionfs, so don't
worry about that logic.
Just as information:
The last problem I've fixed before was the sync with the cache after
rename occured.
In Head the call to genfs_rename_cache_purge() was commented out in
tmpfs_gro_rename.
5.1 seems to need it, because I've ended up with having the object in
the correct tmpfs-structure, but when accessing it in the cache it is
not found.
Symtom: "ls <dir>" will show the name but "ls -l <dir>" say "file not
found".
This is not surprising. It's probably safe to uncomment all of the
commented cache purges that I left scattered around -- the only reason
I didn't do that was to preserve the existing behaviour and to remind
me that the namecache needs careful analysis which I haven't done.
But as a stop-gap before that analysis is done, overzealously purging
the cache may improve correctness and will probably at worst only hurt
performance.
Home |
Main Index |
Thread Index |
Old Index