NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/44568: WAPBL doens't play nice with snapshots



>Number:         44568
>Category:       kern
>Synopsis:       WAPBL doens't play nice with snapshots
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 14 16:15:00 +0000 2011
>Originator:     Manuel Bouyer
>Release:        NetBSD 5.99.45
>Organization:
>Environment:
System: NetBSD java 5.99.45 NetBSD 5.99.45 (GENERIC) #0: Thu Feb 10 05:03:13 
UTC 2011  
builds%b7.netbsd.org@localhost:/home/builds/ab/HEAD/amd64/201102100300Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC
 amd64
Architecture: x86_64
Machine: x86_64
>Description:
        taking a persistent snapshot of a 500GB WAPBL-enabled ffs filesystem
        panics with:
panic: wapbl_flush: current transaction too big to flush.
I've seen different stack trace but ffs_sync() is in it,
either called from the VOP_FSYNC() call in ffs_snapshot() or from
sched_fsync().


>How-To-Repeat:
        assuming /home is a 500Go ffs rw,log filesystem:
        fssconfig fss0 /home /home/snap
>Fix:
        WAPBL transactions needs to be splitted. The patch below makes
        things better for me but it still panics on the second snapshot.
        There are other suspicious places, like snapshot_expunge() which seems
        to do a lot of things inside a single transaction (maybe
        we could start/end the transaction inside the loop instead?)
        I also get this same panic when rm'ing the snapshot file,
        from ufs_inactive().

Index: sys/ufs/ffs/ffs_snapshot.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_snapshot.c,v
retrieving revision 1.102.4.2
diff -u -p -u -r1.102.4.2 ffs_snapshot.c
--- sys/ufs/ffs/ffs_snapshot.c  12 Feb 2011 21:48:09 -0000      1.102.4.2
+++ sys/ufs/ffs/ffs_snapshot.c  14 Feb 2011 16:01:27 -0000
@@ -489,6 +489,12 @@ snapshot_setup(struct mount *mp, struct 
                if (error)
                        goto out;
                bawrite(nbp);
+               if ((loc % 16) == 0) {
+                       UFS_WAPBL_END(mp);
+                       error = UFS_WAPBL_BEGIN(mp);
+                       if (error)
+                               return error;
+               }
        }
 
 out:
@@ -825,6 +831,12 @@ snapshot_writefs(struct mount *mp, struc
                memcpy(bp->b_data, space, fs->fs_bsize);
                space = (char *)space + fs->fs_bsize;
                bawrite(bp);
+               if (((loc + 1) % 16) == 0) {
+                       UFS_WAPBL_END(mp);
+                       error = UFS_WAPBL_BEGIN(mp);
+                       if (error)
+                               return error;
+               }
        }
        if (error)
                goto out;
@@ -892,6 +904,12 @@ cgaccount(struct vnode *vp, int passno, 
                bawrite(nbp);
                if (error)
                        break;
+               if (((cg + 1) % 16) == 0) {
+                       UFS_WAPBL_END(vp->v_mount);
+                       error = UFS_WAPBL_BEGIN(vp->v_mount);
+                       if (error)
+                               return error;
+               }
        }
        UFS_WAPBL_END(vp->v_mount);
        return error;



Home | Main Index | Thread Index | Old Index