tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

FS corruption because of bufio_cache pool depletion?



I am seeing FS corruption on my development server in the source trees.
The server is running Xen on i386 with a 128MB RAM dom0 and 256MB RAM domUs.
I'm using netbsd-5 in the dom0 and some domUs -current in other domUs.

Typical ways to provoke corruption is rsync'ing a source tree from the
vnd-backed xbd in a domU to local partition in the dom0 or running "cvs
update" in the dom0 on a tree.  The most obvious damage was corrupt CVS/Root
and directory contents.

Once I got an I/O error in a domU from the xbd with the sources on it during
a build.sh run.  At that point I noticed the following messages in the
kernel message buffer:

raid1: IO failed after 5 retries.
cgd1: error 5
xbd IO domain 1: error 5

"vmstat -m" was reporting a high "Fail" rate of ~200 requests for the "biopl"
pool at that point.

The dom0 is using 2 pairs of SCSI disks in RAIDframe RAID1 configurations
with cgds on top.

Greg Oster suggested to prime the bufcache_pool and see what the effect is.

I patched kern/vfs_bio.c as follows:

RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.210
diff -u -r1.210 vfs_bio.c
--- vfs_bio.c   11 Sep 2008 09:14:46 -0000      1.210
+++ vfs_bio.c   25 Jan 2010 20:43:45 -0000
@@ -471,6 +471,8 @@
            "bufpl", NULL, IPL_SOFTBIO, NULL, NULL, NULL);
        bufio_cache = pool_cache_init(sizeof(buf_t), 0, 0, 0,
            "biopl", NULL, IPL_BIO, NULL, NULL, NULL);
+       pool_cache_setlowat(bufio_cache, 100);
+       pool_prime(&bufio_cache->pc_pool, 100);
 
        bufmempool_allocator.pa_backingmap = buf_map;
        for (i = 0; i < NMEMPOOLS; i++) {

Since applying that patch I haven't been able to reproduce the FS corruption
I was able to reliably trigger earlier.  Also, the failure numbers for the
biopl pool are much better:

Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
biopl        172    16880   19    15998   199   152    47    88    10   inf    8

Does that ring a bell with anyone?

--chris


Home | Main Index | Thread Index | Old Index