tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: need help with kern/35704 (UBC-related)

On Sun, Jan 31, 2010 at 11:33:47PM +0100, Manuel Bouyer wrote:
> hi,
> I need some help with the reproductible issue reported in kern/35704.
> This also happens with NetBSD 5.0_STABLE, and filling up the filesystem
> isn't required: being overquota is enough.
> I reproduced it with:
> echo aaaa > m
> while cat m >> p; do done
> (so it's not an issue with cat reading and writing the same file).
> When the while loop ends, p contains garbage. Also I wonder if this
> can cause filesystem corruption, because on reboot occasionally get:
> panic: kernel debugging assertion "(vp->v_iflag & VI_ONWORKLST)" failed: file 
> "/dsk/l1/misc/bouyer/netbsd-5/src/sys/miscfs/genfs/genfs_io.c", line 1017
> umounting the root filesystem (which is not the one with m and p).
> What I found so far is that ffs_write() is called with small incremental
> writes of 5 chars each. At one point it needs a new fragment to
> extend the file, and ufs_balloc_range() returns an error (as expeted).
> Then ffs_write() calls ffs_truncate() with the old size.
> Here's a kernel output with some printfs I added:
> ffs_write(0xffffa000134073e8, osize 6120, uio->uio_offset 6120, uio_resid 5 
> nsize 6125)
> ffs_write(0xffffa000134073e8, osize 6125, uio->uio_offset 6125, uio_resid 5 
> nsize 6130)
> ffs_write(0xffffa000134073e8, osize 6130, uio->uio_offset 6130, uio_resid 5 
> nsize 6135)
> ffs_write(0xffffa000134073e8, osize 6135, uio->uio_offset 6135, uio_resid 5 
> nsize 6140)
> ffs_write(0xffffa000134073e8, osize 6140, uio->uio_offset 6140, uio_resid 5 
> nsize 6145)
> chkdq error 69
> ffs_write ufs_balloc_range(0xffffa000134073e8, 6140, 5) failed
> ffs_write UFS_TRUNCATE(6140)
> ufs_balloc_range() will allocate page backing the range which will
> be extended in the file, and do so before trying to allocate disk
> blocks. If the block allocation fails, the pages are marked
> PG_RELEASED (otherwise PG_RDONLY is removed) and uvm_page_unbusy()
> is called. However, as this is not crossing a page boundary,
> I guess no new page is really allocated, and
> the ones that were used for previous writes to the file are reused.
> I suspect the way ufs_balloc_range() handle the failure cause the
> pages to be unmapped without being flushed, but I have no idea where
> the garbage comes from. It obviously comes from some other files
> open in the system (I got some ELF headers in there for example).
> When UFS_TRUNCATE() is called, we have v_writesize set to 6145 while
> v_size is still 6140.

I made some progress on this.
It seems that UFS_TRUNCATE() isn't doing anything UBC-related in this case,
the pages allocated do still contain data previsouly written (the write
error doesn't occur on a page boundary).

I went back looking at ufs_balloc_range(). For each 5-byte write
ufs_balloc_range() returns the same physical page (except when crossing
a page boundary). I suspect releasing/invalidating the pages in case of
fs block allocation failure is rude, because these pages contains valid data
that have not yet been flushed.
So I came with the attached patch, which fixes the data block corruption for
me. I have no idea if it's correct in all case, at last this should point
where the issue is. I also wonder if the "PG_RELEASED | PG_CLEAN" in
the retry case can't cause the same corruption.

Now I'm seeing another kind of corruption, which looks a lot like what I've
seen on my NFS servers:
the data block of the directory where I'm doing the test (which happens
to be the root directory of the fs and also the only directory present
on this fs) is zeroed out after the disk block allocation failure
(I checked this, the disk block really contains all 0).
I'll try to track down this one as well.

Manuel Bouyer <>
     NetBSD: 26 ans d'experience feront toujours la difference
Index: ufs_inode.c
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_inode.c,v
retrieving revision
diff -u -p -u -r1.76.4.1 ufs_inode.c
--- ufs_inode.c 8 Feb 2009 19:08:23 -0000
+++ ufs_inode.c 2 Feb 2010 16:59:21 -0000
@@ -328,11 +328,11 @@ ufs_balloc_range(struct vnode *vp, off_t
        GOP_SIZE(vp, off + len, &eob, 0);
        for (i = 0; i < npages; i++) {
-               if (error) {
-                       pgs[i]->flags |= PG_RELEASED;
-               } else if (off <= pagestart + (i << PAGE_SHIFT) &&
+               if (off <= pagestart + (i << PAGE_SHIFT) &&
                    pagestart + ((i + 1) << PAGE_SHIFT) <= eob) {
                        pgs[i]->flags &= ~PG_RDONLY;
+               } else if (error) {
+                       pgs[i]->flags |= PG_RELEASED;
        if (error) {

Home | Main Index | Thread Index | Old Index