Re: Vnode API change: add global vnode cache

To: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
Subject: Re: Vnode API change: add global vnode cache
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Mon, 7 Apr 2014 16:02:03 +0000

   Date: Mon, 7 Apr 2014 15:51:00 +0200
   From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>

   On 07 Apr 2014, at 03:22, Taylor R Campbell 
<campbell+netbsd-tech-kern%mumble.net@localhost> wrote:

   > Is the plan to nix getnewvnode and ungetnewvnode?  It would be good to
   > avoid long-term duplication of its body.  But I suspect we'll want to
   > keep those for, e.g., tmpfs, in which case this should use getnewvnode
   > and ungetnewvnode instead of vnalloc/setup/teardown/vnfree.

   Not sure if we have to keep getnewvnode() in the long term.

   It cannot be used here as we don't want the partially initialised
   vnode (before VFS_LOAD_NODE) on the mnt_vnodelist.

   If we end up replacing getnewvnode() with a two-stage allocator that
   initialises in the first stage and finalises in the second stage we
   may use it here.

In that case, could you set the VI_CHANGING bit in vp, rather than
node->vn_vnode = NULL in the vcache node, to indicate that the vnode
is being initialized in vcache_intern?  That way,

(a) it is safe to put the vnode on the mount list -- anyone trying to
use it from there must use vget first --;

(b) there is only one place waits happen, namely vget (vwait), so no
need for vcache.cv or some of the complex loop logic in vcache_intern;

(c) tmpfs can mirror the vcache_intern logic without more duplication
of code from getnewvnode; and

(d) as a tiny added bonus, omitting vcache.cv will make the vcache
struct fit in a single cache line on, e.g., amd64.

Generally, we should have only one mechanism for weak references to
vnodes, and currently that's vget.  If we do want to take dholland's
idea of a separate structure for weak references to vnodes, then the
vnode mount list and anything else should use that too.  But I think
vget is good enough for now.

Also, it occurs to me that we're going to have not two but multiple
copies of initialization code that is currently in getnewvnode,
because in your API, some is copied in vcache_intern and some must be
copied in VFS_LOAD_NODE.

So I'm inclined to say that vcache_intern should itself look a little
more like getnewvnode, and perhaps VFS_LOAD_NODE should be VOP_LOAD
instead.  See attached file for a sketch this, and a tmpfs version.

Except for the VI_CHANGING part, the vcache_intern I attached uses
nothing internal to the vnode abstraction.  To address VI_CHANGING, we
could make getnewvnode set it by default and require callers to use
some new routine, say vready, to clear it.  Then vcache_intern, or
clones of it like tmpfs will want, can live entirely outside the vnode
API.

   > Does anything use VFS_VGET in ufs now?  If so, why?  If not, this
   > should just fail.  (Eventually we ought to nix VFS_VGET, I think.)

   VFS_VGET() is used by NFS V3 readdirplus so we have to keep it.

Ugh bletch.  OK for now, but it would be nice if there were a better
way to do this.

Perhaps we ought at least to change VFS_VGET so it returns the vnode
unlocked, and once we do that, use it consistently to look up vnodes
by inode number rather than calling vcache_intern in multiple
different places.

   This is wrong, did you mean

           error = vcache_intern(mp, &dotdot_ino, sizeof(dotdot_ino), &dvp);
           vput(vp);
           if (error)
                   return error;
           error = vn_lock(dvp, LK_EXCLUSIVE);
           if (error)
                   return error;

Oops, yes.

   Anyway, not now.

OK.

int
vcache_intern(enum vtagtype tag, struct mount *mp, int (**vops)(void *),
    kmutex_t *interlock, const void *key, size_t key_len, struct vnode **vpp)
{
        struct vnode *vp;
        const void *permanent_key;
        struct vcache_node *node, *new_node;
        const struct vcache_key vcache_key = {
                .vk_mount = mp,
                .vk_key = key,
                .vk_key_len = key_len,
        };
        int error;

again:  /* Find and vget the vnode for this key if it exists.  */
        mutex_enter(&vcache.lock);
        node = rb_tree_find_node(&vcache.tree, &vcache_key);
        if (__predict_true(node != NULL)) {
                vp = node->vn_vnode;
                KASSERT(vp != NULL);
                mutex_enter(vp->v_interlock);
                mutex_exit(&vcache.lock);
                error = vget(vp, 0);
                if (__predict_true(error != ENOENT)) {
                        if (__predict_false(error))
                                vp = NULL;
                        goto out;
                }
                goto again;
        }
        mutex_exit(&vcache.lock);

        /*
         * None yet.  Create one, and mark it VI_CHANGING so that once
         * we publish it, anyone else trying to vget it will wait.
         */
        error = getnewvnode(tag, mp, vops, interlock, &vp);
        if (error)
                goto out;
        vp->v_iflag |= VI_CHANGING;
        new_node = pool_cache_get(vcache.pool, PR_WAITOK);
        new_node->vn_vnode = vp;
        new_node->vn_key = vcache_key;

        /* Publish it, unless someone else got in first.  */
        mutex_enter(&vcache.lock);
        node = rb_tree_insert_node(&vcache.tree, new_node);
        mutex_exit(&vcache.lock);
        if (__predict_false(node != new_node)) {
                ungetnewvnode(vp);
                pool_cache_put(vcache.pool, new_node);
                goto again;
        }

        /* Published.  Load it up from disk.  */
#if 0
        error = VOP_LOAD(vp, key, key_len, &permanent_key);
#else
        error = VFS_LOAD_NODE(mp, vp, key, key_len, &permanent_key);
#endif
        if (error) {
                mutex_enter(&vcache.lock);
                rb_tree_remove_node(&vcache.tree, new_node);
                mutex_exit(&vcache.lock);
                ungetnewvnode(vp);
                pool_cache_put(vcache.pool, new_node);
                vp = NULL;
                goto out;
        }

        /* Set up the permanent key.  */
        KASSERT(permanent_key != NULL);
        KASSERT(memcmp(key, permanent_key, key_len) == 0);
        mutex_enter(&vcache.lock);
        new_node->vn_key.vk_key = permanent_key;
        mutex_exit(&vcache.lock);

        /* Loaded and ready.  Wake any waiters and let the merriment begin.  */
        mutex_enter(vp->v_interlock);
        KASSERT(ISSET(vp->v_iflag, VI_CHANGING));
        vp->v_iflag &= ~VI_CHANGING;
        cv_broadcast(vp->v_cv);
        mutex_exit(vp->v_interlock);

out:    *vpp = vp;
        KASSERT((error == 0) == (*vpp != NULL));
        KASSERT(error || (interlock == NULL) ||
            (vp->v_interlock == interlock));
        return error;
}

int
tmpfs_vnode_vget(struct mount *mp, struct tmpfs_node *tn, struct vnode *vpp)
{
        struct vnode *vp, *new_vp;
        int s;
        int error;

again:  /* Get the vnode for this tmpfs node if it exists.  */
        mutex_enter(&tn->tn_vlock);
        if ((vp = tn->tn_vnode) != NULL) {
                mutex_enter(vp->v_interlock);
                mutex_exit(&tn->tn_vlock);
                error = vget(vp, 0);
                if (error != ENOENT) {
                        if (error)
                                vp = NULL;
                        goto out;
                }
                goto again;
        }
        mutex_exit(&tn->tn_vlock);

        /*
         * Not yet.  Create one, and mark it VI_CHANGING so that once
         * we publish it, anyone else trying to vget it will wait.
         */
        error = getnewvnode(VT_TMPFS, mp, tmpfs_vnodeop_p,
            (tn->tn_type == VREG? tn->tn_spec.tn_reg.tn_aobj->vmobjlock
                : NULL),
            &new_vp);
        if (error)
                goto out;
        new_vp->v_iflag |= VI_CHANGING;

        /* Publish it, unless someone else got in first.  */
        mutex_enter(&tn->tn_vlock);
        vp = tn->tn_vnode;
        if ((vp = tn->tn_vnode) != NULL) {
                mutex_exit(&tn->tn_vlock);
                ungetnewvnode(vp);
                goto again;
        }
        mutex_exit(&tn->tn_vlock);

        /* Published.  Set it up.  */
        /* XXX Perhaps move this to a VOP_LOAD/VFS_LOAD_NODE.  */
        switch (tn->tn_type) {
        case VBLK:
        case VCHR:
                vp->v_op = tmpfs_specop_p;
                spec_node_init(vp, node->tn_spec.tn_dev.tn_rdev);
                break;
        case VDIR:
                vp->v_vflag |= (tn->tn_spec.tn_dir.tn_parent == NODE? VV_ROOT
                    : 0);
                break;
        case VFIFO:
                vp->v_op = tmpfs_fifoop_p;
                break;
        case VLNK:
        case VREG:
        case VSOCK:
                break;
        default:
                KASSERT(false);
        }
        uvm_vnp_setsize(vp, tn->tn_size);
        vp->v_data = tn;

        /* Ready.  Wake any waiters and let the merriment begin.  */
        mutex_enter(vp->v_interlock);
        KASSERT(ISSET(vp->v_iflag, VI_CHANGING));
        vp->v_iflag &= ~VI_CHANGING;
        cv_broadcast(vp->v_cv);
        mutex_exit(vp->v_interlock);

out:    *vpp = vp;
        KASSERT((error == 0) == (*vpp == 0));
        return error;
}

Follow-Ups:
- Re: Vnode API change: add global vnode cache
  - From: J. Hannken-Illjes
- Re: Vnode API change: add global vnode cache
  - From: J. Hannken-Illjes

References:
- Re: Vnode API change: add global vnode cache
  - From: J. Hannken-Illjes

Prev by Date: Re: Vnode API change: add global vnode cache
Next by Date: Re: Vnode API change: add global vnode cache
Previous by Thread: Re: Vnode API change: add global vnode cache
Next by Thread: Re: Vnode API change: add global vnode cache
Indexes:

Home | Main Index | Thread Index | Old Index