tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: FS corruption because of bufio_cache pool depletion?



On Mon, Jan 25, 2010 at 10:52:03PM +0100, Christoph Badura wrote:
> I am seeing FS corruption on my development server in the source trees.
> The server is running Xen on i386 with a 128MB RAM dom0 and 256MB RAM domUs.
> I'm using netbsd-5 in the dom0 and some domUs -current in other domUs.
> 
> Typical ways to provoke corruption is rsync'ing a source tree from the
> vnd-backed xbd in a domU to local partition in the dom0 or running "cvs
> update" in the dom0 on a tree.  The most obvious damage was corrupt CVS/Root
> and directory contents.

Can you give more details on the corruption ?
Was it only directory entries that were corrupted, or did you notice
corruptions in the data block too ?
I'm seeing panic like:
bad dir ino 14212602 at offset 0: mangled entry
on NFS servers (a few times a year) and the directory is indeed
corrupted on fsck. I've seen this with both netbsd-3 and netbsd-5

> Once I got an I/O error in a domU from the xbd with the sources on it during
> a build.sh run.  At that point I noticed the following messages in the
> kernel message buffer:
> 
> raid1: IO failed after 5 retries.
> cgd1: error 5
> xbd IO domain 1: error 5

It seems raidframe doesn't do anything special for memory failure.
It returns EIO for the whole request if it can't get an entry
from bufio_cache for I/O to one component. Maybe it should wait and
retry to I/O later ?
dk(4) does this ...

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index