tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded



On 1/15/20, Andrew Doran <ad%netbsd.org@localhost> wrote:
> I don't understand why we do this.  I don't think it's needed at all
> because
> if the file really is deleted and the inode is getting cleaned out, then it
> shouldn't be getting new refs in any of the usual ways and we don't need to
> worry about it.  And in any case the vnode lock prevents anyone from
> messing
> with the inode during VOP_INACTIVE().
>
> I want to change this because it causes nasty problems on MP.  For example
> I
> see threads very regulary thrown out of the namecache by vcache_vtryget(),
> and when they finally get the vnode ref'd and locked they do it by digging
> through the file system metadata (when everything is already in cache).
>
>       http://www.netbsd.org/~ad/inactive.diff
>

I think looking at a bigger picture suggests a different solution, which
will also keep the current blocked <-> loaded transition for consideration
later.

I'm assuming the goal for the foreseeable future is to achieve path lookup
with shared vnode locks.

Since ->v_usecont transition to 0 happens all the time and VOP_INACTIVE
processing is almost never needed, the current requirement to hold an
exclusive lock just to find out there is nothing to do should be lifted.
Apart from getting away with only shared lock (or no locks) this would
also naturally eliminate vnode state changes for the common case.

Preferably vfs would provide a method for filesystems to call to signify
they want VOP_INACTIVE to be called for given vnode, then vrelel could
just test a bit. However, this is probably not worth the effort right now.

Said locking was also a problem in FreeBSD. It got temporarily sorted
out with introduction of VOP_NEED_INACTIVE -- by default it returns 1,
but filesystems which provide their own method easily avoid the problem.
I think starting with requiring at least shared lock would do the trick
just fine while providing race-free methods at least for ufs and tmpfs.

Then vrelel could stick to:
diff --git a/sys/kern/vfs_vnode.c b/sys/kern/vfs_vnode.c
index d05d5180f730..45d80dab4a2c 100644
--- a/sys/kern/vfs_vnode.c
+++ b/sys/kern/vfs_vnode.c
@@ -774,7 +774,7 @@ vrelel(vnode_t *vp, int flags, int lktype)
         * If not clean, deactivate the vnode, but preserve
         * our reference across the call to VOP_INACTIVE().
         */
-       if (VSTATE_GET(vp) == VS_RECLAIMED) {
+       if (VSTATE_GET(vp) == VS_RECLAIMED || !VOP_NEED_INACTIVE(vp))
                VOP_UNLOCK(vp);
        } else {
                VSTATE_CHANGE(vp, VS_LOADED, VS_BLOCKED);

which even with exclusive locks already avoids the blocked state problem.

Another option is to move some of the current body into smaller halpers
and then let filesystem piece together their own VOP_VRELE.

-- 
Mateusz Guzik <mjguzik gmail.com>



Home | Main Index | Thread Index | Old Index