ZFS vs NetBSD vnode recycling

To: NetBSD Kernel <tech-kern%netbsd.org@localhost>
Subject: ZFS vs NetBSD vnode recycling
From: Adam Hamsik <haaaad%gmail.com@localhost>
Date: Tue, 13 Oct 2009 12:02:58 +0200

Hi folks,

I spent yesterday with debugging zfs vnode recycling problem on i386.and I have found was is the cause of locking against myself panicswhich I have seen during testing zfs on i386.

In zfs mount structure there is array of mutexes[1] called z_hold_mtxwhich are locked by [2] function. ZFS_OBJ_HOLD_ENTER takes 2 argumentson is zfs mount structure and second one is znode(zfs inode) id.Address of a mutex is computed by adding obj_num with a number ofmutexes(64). Panic happen in a case when zfs_mknode [3] hold mutexwith id which points to same mutex as some another already used znodeid which was already used. If this old vnode is picked bygetcleanvnode->vclean->VOP_RECLAIM->zfs_netbsd_reclaim->zfs_zinactivezfs tries to lock same mutex in zfs_zinactive like it was locked inzfs_mknode.


I think that there are 3 possible solutions

1) Defer calling of zfs_zinactive, to system taskq, which can laterdestroy znode. I have tested this version and it works but I'm gettingdeadlock. I need to investigate it, if it is caused by my change or not.

2) Some sort of vreclaim patch(disable vnode recycling and just callvnalloc in getnewvnode), which will be done right this time. I'mwilling to do it but I'm not sure how it should be done.

3) Do it like FreeBSD does it. They do almost the same like we in 1.But they do it differently.


static void
zfs_reclaim_complete(void *arg, int pending)
{
        znode_t *zp = arg;
        zfsvfs_t *zfsvfs = zp->z_zfsvfs;

        ZFS_LOG(1, "zp=%p", zp);
        ZFS_OBJ_HOLD_ENTER(zfsvfs, zp->z_id);
        zfs_znode_dmu_fini(zp);
        ZFS_OBJ_HOLD_EXIT(zfsvfs, zp->z_id);
        zfs_znode_free(zp);
}

static int
zfs_freebsd_reclaim(ap)
        struct vop_reclaim_args /* {
                struct vnode *a_vp;
                struct thread *a_td;
        } */ *ap;
{
        vnode_t *vp = ap->a_vp;
        znode_t *zp = VTOZ(vp);
        zfsvfs_t *zfsvfs;

        ASSERT(zp != NULL);

        /*
         * Destroy the vm object and flush associated pages.
         */
        vnode_destroy_vobject(vp);

        mutex_enter(&zp->z_lock);
        ASSERT(zp->z_phys);
        ZTOV(zp) = NULL;

        if (!zp->z_unlinked) {
                int locked;

                zfsvfs = zp->z_zfsvfs;
                mutex_exit(&zp->z_lock);

locked = MUTEX_HELD(ZFS_OBJ_MUTEX(zfsvfs, zp->z_id)) ? 2 :

                    ZFS_OBJ_HOLD_TRYENTER(zfsvfs, zp->z_id);
                if (locked == 0) {
                        /*

* Lock can't be obtained due to deadlockpossibility,

                         * so defer znode destruction.
                         */

TASK_INIT(&zp->z_task, 0,zfs_reclaim_complete, zp);taskqueue_enqueue(taskqueue_thread, &zp->z_task);

                } else {
                        zfs_znode_dmu_fini(zp);
                        if (locked == 1)
                                ZFS_OBJ_HOLD_EXIT(zfsvfs, zp->z_id);
                        zfs_znode_free(zp);
                }
        } else {
                mutex_exit(&zp->z_lock);
        }
        VI_LOCK(vp);
        vp->v_data = NULL;
        ASSERT(vp->v_holdcnt >= 1);
        VI_UNLOCK(vp);
        return (0);
}

Do you have any suggestions ?


[1] 
http://nxr.aydogan.net/source/xref/src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_vfsops.h#z_hold_mtx
[2] 
http://nxr.aydogan.net/source/xref/src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_znode.h#ZFS_OBJ_HOLD_ENTER
[3] 
http://nxr.aydogan.net/xref/src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_znode.c#830

Regards

Adam.

Follow-Ups:
- Re: ZFS vs NetBSD vnode recycling
  - From: Matthias Scheler

Prev by Date: MSI and MSI-X (message-signaled interrupts)
Next by Date: Making flock(2) more robust to invalid operations
Previous by Thread: MSI and MSI-X (message-signaled interrupts)
Next by Thread: Re: ZFS vs NetBSD vnode recycling
Indexes:

Home | Main Index | Thread Index | Old Index