tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

need help with kern/35704 (UBC-related)



hi,
I need some help with the reproductible issue reported in kern/35704.
This also happens with NetBSD 5.0_STABLE, and filling up the filesystem
isn't required: being overquota is enough.
I reproduced it with:
echo aaaa > m
while cat m >> p; do done
(so it's not an issue with cat reading and writing the same file).
When the while loop ends, p contains garbage. Also I wonder if this
can cause filesystem corruption, because on reboot occasionally get:
panic: kernel debugging assertion "(vp->v_iflag & VI_ONWORKLST)" failed: file 
"/dsk/l1/misc/bouyer/netbsd-5/src/sys/miscfs/genfs/genfs_io.c", line 1017
umounting the root filesystem (which is not the one with m and p).

What I found so far is that ffs_write() is called with small incremental
writes of 5 chars each. At one point it needs a new fragment to
extend the file, and ufs_balloc_range() returns an error (as expeted).
Then ffs_write() calls ffs_truncate() with the old size.
Here's a kernel output with some printfs I added:
ffs_write(0xffffa000134073e8, osize 6120, uio->uio_offset 6120, uio_resid 5 
nsize 6125)
ffs_write(0xffffa000134073e8, osize 6125, uio->uio_offset 6125, uio_resid 5 
nsize 6130)
ffs_write(0xffffa000134073e8, osize 6130, uio->uio_offset 6130, uio_resid 5 
nsize 6135)
ffs_write(0xffffa000134073e8, osize 6135, uio->uio_offset 6135, uio_resid 5 
nsize 6140)
ffs_write(0xffffa000134073e8, osize 6140, uio->uio_offset 6140, uio_resid 5 
nsize 6145)
chkdq error 69
ffs_write ufs_balloc_range(0xffffa000134073e8, 6140, 5) failed
ffs_write UFS_TRUNCATE(6140)

ufs_balloc_range() will allocate page backing the range which will
be extended in the file, and do so before trying to allocate disk
blocks. If the block allocation fails, the pages are marked
PG_RELEASED (otherwise PG_RDONLY is removed) and uvm_page_unbusy()
is called. However, as this is not crossing a page boundary,
I guess no new page is really allocated, and
the ones that were used for previous writes to the file are reused.
I suspect the way ufs_balloc_range() handle the failure cause the
pages to be unmapped without being flushed, but I have no idea where
the garbage comes from. It obviously comes from some other files
open in the system (I got some ELF headers in there for example).

When UFS_TRUNCATE() is called, we have v_writesize set to 6145 while
v_size is still 6140.

Now I don't know where to go, as I don't know the details of UBC internals.
Can anyone help ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index