tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

solved ? [Re: need help with kern/35704 (UBC-related)]



On Tue, Feb 02, 2010 at 06:10:59PM +0100, Manuel Bouyer wrote:
> I made some progress on this.
> It seems that UFS_TRUNCATE() isn't doing anything UBC-related in this case,
> the pages allocated do still contain data previsouly written (the write
> error doesn't occur on a page boundary).
> 
> I went back looking at ufs_balloc_range(). For each 5-byte write
> ufs_balloc_range() returns the same physical page (except when crossing
> a page boundary). I suspect releasing/invalidating the pages in case of
> fs block allocation failure is rude, because these pages contains valid data
> that have not yet been flushed.
> So I came with the attached patch, which fixes the data block corruption for
> me. I have no idea if it's correct in all case, at last this should point
> where the issue is. I also wonder if the "PG_RELEASED | PG_CLEAN" in
> the retry case can't cause the same corruption.
> 
> Now I'm seeing another kind of corruption, which looks a lot like what I've
> seen on my NFS servers:
> the data block of the directory where I'm doing the test (which happens
> to be the root directory of the fs and also the only directory present
> on this fs) is zeroed out after the disk block allocation failure
> (I checked this, the disk block really contains all 0).
> I'll try to track down this one as well.

I found the cause of for this one:
Before extending the file, ffs_write() does a uvm_vnp_setwritesize(new_size),
which sets vp->v_writesize without changing vp->v_size.
If later disk block allocation fail (or other error occurs while writing)
ffs_trucate() is called with the old size. But if error was caused by
ufs_balloc_range(), uvm_vnp_setsize() has never been called; also ip->i_size
still has the old inode size (as no new disk block have been allocated).
So ffs_trucate() does a simple ffs_update(), and v_writesize remains with
a too large value.
Later genfs_do_io() will use v_writesize to find the end of file,
and will write past the real end of the file, overwritting whatever is
after the last fragment (in my case, the directory's data block).

To fix this I propose to have ffs_trucate() (and derivatives) always set 
v_writesize, even if the real size of the inode didn't change.
The attached patch completely fixes the test case from kern/35704
for me. I suspect it could also fix other related file system full
related PRs.

Anyone has commnts about this ? I'd still like to hear some from
UVM/UBC experts ...

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Index: ffs/ffs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_inode.c,v
retrieving revision 1.99
diff -u -p -u -r1.99 ffs_inode.c
--- ffs/ffs_inode.c     30 Aug 2008 08:25:53 -0000      1.99
+++ ffs/ffs_inode.c     2 Feb 2010 21:50:33 -0000
@@ -242,6 +242,8 @@ ffs_truncate(struct vnode *ovp, off_t le
                return (ffs_update(ovp, NULL, NULL, 0));
        }
        if (oip->i_size == length) {
+               /* still do a uvm_vnp_setsize() as writesize may be larger */
+               uvm_vnp_setsize(ovp, length);
                oip->i_flag |= IN_CHANGE | IN_UPDATE;
                return (ffs_update(ovp, NULL, NULL, 0));
        }
Index: ufs/ufs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_inode.c,v
retrieving revision 1.76.4.1
diff -u -p -u -r1.76.4.1 ufs_inode.c
--- ufs/ufs_inode.c     8 Feb 2009 19:08:23 -0000       1.76.4.1
+++ ufs/ufs_inode.c     2 Feb 2010 21:50:33 -0000
@@ -328,11 +328,11 @@ ufs_balloc_range(struct vnode *vp, off_t
        GOP_SIZE(vp, off + len, &eob, 0);
        mutex_enter(&uobj->vmobjlock);
        for (i = 0; i < npages; i++) {
-               if (error) {
-                       pgs[i]->flags |= PG_RELEASED;
-               } else if (off <= pagestart + (i << PAGE_SHIFT) &&
+               if (off <= pagestart + (i << PAGE_SHIFT) &&
                    pagestart + ((i + 1) << PAGE_SHIFT) <= eob) {
                        pgs[i]->flags &= ~PG_RDONLY;
+               } else if (error) {
+                       pgs[i]->flags |= PG_RELEASED;
                }
        }
        if (error) {
Index: lfs/lfs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/lfs/lfs_inode.c,v
retrieving revision 1.120
diff -u -p -u -r1.120 lfs_inode.c
--- lfs/lfs_inode.c     28 Apr 2008 20:24:11 -0000      1.120
+++ lfs/lfs_inode.c     2 Feb 2010 21:50:54 -0000
@@ -226,8 +226,11 @@ lfs_truncate(struct vnode *ovp, off_t le
        /*
         * Just return and not update modification times.
         */
-       if (oip->i_size == length)
+       if (oip->i_size == length) {
+               /* still do a uvm_vnp_setsize() as writesize may be larger */
+               uvm_vnp_setsize(ovp, length);
                return (0);
+       }
 
        if (ovp->v_type == VLNK &&
            (oip->i_size < ump->um_maxsymlinklen ||
Index: ext2fs/ext2fs_inode.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ext2fs/ext2fs_inode.c,v
retrieving revision 1.66
diff -u -p -u -r1.66 ext2fs_inode.c
--- ext2fs/ext2fs_inode.c       16 May 2008 09:22:00 -0000      1.66
+++ ext2fs/ext2fs_inode.c       2 Feb 2010 21:50:54 -0000
@@ -277,6 +277,8 @@ ext2fs_truncate(struct vnode *ovp, off_t
                return (ext2fs_update(ovp, NULL, NULL, 0));
        }
        if (ext2fs_size(oip) == length) {
+               /* still do a uvm_vnp_setsize() as writesize may be larger */
+               uvm_vnp_setsize(ovp, length);
                oip->i_flag |= IN_CHANGE | IN_UPDATE;
                return (ext2fs_update(ovp, NULL, NULL, 0));
        }


Home | Main Index | Thread Index | Old Index