NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race condition not handled

The following reply was made to PR kern/46990; it has been noted by GNATS.

From: Taylor R Campbell <>
To: Wolfgang Stukenbrock <>
Subject: Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race 
condition not handled
Date: Fri, 21 Sep 2012 13:12:46 +0000

    Date: Fri, 21 Sep 2012 13:10:26 +0200
    From: Wolfgang Stukenbrock 
    If we can exclude the first loop pass, than the "problem" must happen 
    while walking up the parent chain.
    We will vput(vp) after getting tn_vlock on the parent - so we have no 
    lock on any vnode anymore.
    The normal way is to get tn_vlock and than seaches for the vnode by 
    calling tmpfs_vnode_get(). There we have two different ways to handle 
    the locks:
 It looks to me as though this is the source of the problem.  I didn't
 look closely enough at tmpfs_vnode_get before, but now that I look at
 it, it seems that the only invariant it preserves is the existence of
 the tmpfs node -- that is, tmpfs_reclaim will not free the tmpfs node
 while we are getting a vnode for it.  All other invariants may be lost
 because tmpfs_vnode_get drops all locks except the tmpfs reclaiming
 In that case, it is necessary to check tmpfs_rmdired_p after
 tmpfs_vnode_get, so try adding the following fragment at the very end
 of the loop, after `error = tmpfs_vnode_get(mp, dnode, &vp); if
 (error) return error;':
                 * tmpfs_vnode_get only guarantees that dnode will not
                 * be freed while we get a vnode for it.  It does not
                 * preserve any other invariants, so we must check
                 * whether the parent has been removed in the meantime.
                if (tmpfs_rmdired_p(vp)) {
                        printf("tmpfs_vnode_get lost race with rmdir\n");
                        return ENOENT;
 Can you try this and see how dirconc goes?  In particular, please let
 me know if you hit the printf.
    What will happen if I've a reference to a directory (e.g. current-dir) 
    and someone delete it and the parent of it too?
 Then you'll have a reference to the vnode for a deleted directory.
 But if you have the vnode locked, then nothing can delete it.
    I've seen no code that will destroy the link between the vnode and the 
    node structure till now, but I assume it is present somewhere.
 This happens in tmpfs_reclaim, when the vnode ceases to be referenced.
    As a second action all WHITEOUT children of the just removed directory 
    are simply deleted during directory delete operation. (calls to 
    tmpfs_dir_detach and tmpfs_free_dirent without any checks for vnode 
    If there are still vnodes associated to the nodes in the dirents, is 
    this code save?
 WHITEOUT stuff is not relevant unless you are using onionfs, so don't
 worry about that logic.
    Just as information:
    The last problem I've fixed before was the sync with the cache after 
    rename occured.
    In Head the call to genfs_rename_cache_purge() was commented out in 
    5.1 seems to need it, because I've ended up with having the object in 
    the correct tmpfs-structure, but when accessing it in the cache it is 
    not found.
    Symtom: "ls <dir>" will show the name but "ls -l <dir>" say "file not 
 This is not surprising.  It's probably safe to uncomment all of the
 commented cache purges that I left scattered around -- the only reason
 I didn't do that was to preserve the existing behaviour and to remind
 me that the namecache needs careful analysis which I haven't done.
 But as a stop-gap before that analysis is done, overzealously purging
 the cache may improve correctness and will probably at worst only hurt

Home | Main Index | Thread Index | Old Index