The following reply was made to PR kern/46990; it has been noted by GNATS.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/46990: tmpfs_gro_genealogy() panic - rename/rmdir race
condition not handled
Date: Fri, 21 Sep 2012 13:10:26 +0200
This is a multi-part message in MIME format.
--------------030801010402050202010402
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Hi, of cause you can have them.
I've already ask if someone is interested in - see PR46955.
The attached tar file contains all modified files.
It is mainly the version from Head with some "missing" stuff copied in
from kernel_auth and genfs directly into the tmpfs-sources.
I don't want to upgrade all of other parts too, because then I have to
change everything ...
I don't expect to get the "best integration of the year"-award for this
work. I simply need a stable version for our productive systems running
5.1.x - and currently they lockup in tmpfs from time to time.
(remark: as far as I remember, one kauth-change is not marked with "#if
0 // not in 5.1" in the lookup code - a diff against the version from
head will show it.)
now to the problem itself.
OK - you are right.
tmpfs_gro_lock_directory will check the parent link - I've overlooked
that sorry.
But I've added a printf() statement in my new error handling path when
dnode is NULL and I've got it!
Also the analyses with DDB before shows, that dnode is NULL. The
(manualy) reconstructed call path (accedently mutex_enter is an
assembler stuff that does not have a frame pointer ...) say that
VOP_RENAME calls genfs_insane_rename, that calls tmpfs_sane_rename, that
calls genfs_sane_rename that calls the function from ops at offset 0x40
(this is tmpfs_gro_genealogy - not shown in the backtrace of ddb due to
the missing frame pointer in mutex_enter) and that crashes inside the
call mutex_enter(&dnode->tn_vlock).
I cannot say if it is the first loop or the second one - I assumed the
first one in my report ... My fault.
If we can exclude the first loop pass, than the "problem" must happen
while walking up the parent chain.
We will vput(vp) after getting tn_vlock on the parent - so we have no
lock on any vnode anymore.
The normal way is to get tn_vlock and than seaches for the vnode by
calling tmpfs_vnode_get(). There we have two different ways to handle
the locks:
- If the vnode is present we get the v_interlock on it, release tn_vlock
and call vget().
- If the vnode is missing, we allocate one, call vn_lock on it and than
release the tn_vlock.
The second one is OK in any case, but not used in our case here - I think.
If the vnode is "present", than another process may have locked that
vnode and is e.g. trying to get to "our" vnode. When we release it, this
process will get it. At this time we "only" hold the tn_vlock on the parent.
I'm not shure if there is a possibility that other processes may kill
the old directory vnode and than kill the one where we have the tn_vlock
too.
The only location where tn_parent gets killed is in tmpfs_dir_detach().
and that one is called in tmpfs_gro_rename (nobody may come there -
rename-lock in vfs), tmpfs_gro_remove (not used for directories) and
tmpfs_rmdir.
OK - it is always possible that somewhere else someone is throwing
bombs. I've "only" tested tmpfs with dirconc, so in case of bombs they
may be there but still not explode. but I don't think so.
By the way:
What will happen if I've a reference to a directory (e.g. current-dir)
and someone delete it and the parent of it too?
When a directory is deleted, it is either realy deleted
(tmpfs_free_dirent) or set to WHITEOUT-state (tmpfs_dir_attach again).
I've seen no code that will destroy the link between the vnode and the
node structure till now, but I assume it is present somewhere.
As a second action all WHITEOUT children of the just removed directory
are simply deleted during directory delete operation. (calls to
tmpfs_dir_detach and tmpfs_free_dirent without any checks for vnode
references.)
If there are still vnodes associated to the nodes in the dirents, is
this code save?
I'm not shure, but I've no time to go deeper there.
Perhaps the process is suspended so long, that the dirent-structure gets
reuses and removed (as WHITEOUT) before it continues ...
(I'm running dirconc ....)
The system where I've tested my patch is spare at the moment. So I can
do other tests if desired.
I will enable KASSERT() and will try to get out more information.
Accedently enabling all KASSERT in the kernel will change the timing
behaviour in a way, that the problem may not come up again.
Just as information:
The last problem I've fixed before was the sync with the cache after
rename occured.
In Head the call to genfs_rename_cache_purge() was commented out in
tmpfs_gro_rename.
5.1 seems to need it, because I've ended up with having the object in
the correct tmpfs-structure, but when accessing it in the cache it is
not found.
Symtom: "ls <dir>" will show the name but "ls -l <dir>" say "file not
found".
Either this is wrong in Head too or there are some changes in the
vnode-cache, that I've overlooked. Neverless it should be possible all
the time to kill some "random" entries from the cache. So adding removal
shouldn't do anything wrong - perhaps I loose some performance, but I
can live with that.
But I don't think this is relavant for this problem.
If you prefer that I try it in Head instead, let me know the location of
the source-tar-files on an ftp-server I should use for installation. I
can (cross-)compile them on another system and install the resulting
binaries via network.
Best regards
W. Stukenbrock
Taylor R Campbell wrote:
> Date: Thu, 20 Sep 2012 14:50:01 +0000
> From: Wolfgang.Stukenbrock%nagler-company.com@localhost
>
> The comment in tmpfs_rename.c tmpfs_gro_genealogy() that dnode may
> not be NULL is simply not true. Need to check if dnode is NULL and
> return an error (e.g. ENOENT) instead.
>
> I'm not sure this analysis is correct. If tmpfs_rmdired_p(vp) is
> false, then VP_TO_TMPFS_NODE(vp)->tn_spec.tn_dir.tn_parent is
> guaranteed to be nonnull, provided that vp is locked. Can you enable
> DIAGNOSTIC and see what earlier kassert fires? Also, have you
> reproduced the problem in HEAD too?
>
> I suspect that this not a rename locking issue in particular so much
> as tmpfs locking issue in general -- if I recall correctly, after
> netbsd-5, rmind rototilled the tmpfs locking some time before I
> committed changes to fix tmpfs_rename last year.
>
> Can we see your patches to back-port tmpfs_rename?
>