Subject: kern/23491: fs corruption
To: None <gnats-bugs@gnats.netbsd.org>
From: None <recht@NetBSD.org>
List: netbsd-bugs
Date: 11/18/2003 01:04:35
>Number:         23491
>Category:       kern
>Synopsis:       kernel panics resulting in (massive) fs corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 19 17:42:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Marc Recht
>Release:        NetBSD 1.6ZF
>Organization:
	
>Environment:
	
	
System: NetBSD leeloo.intern.geht.de 1.6ZF NetBSD 1.6ZF (LEELOO) #0: Sun Nov 16 12:46:39 CET 2003 marc@leeloo.intern.geht.de:/usr/obj/sys/arch/i386/compile/LEELOO i386
Architecture: i386
Machine: i386
>Description:
For a while I'm suffering from panics which result in, sometimes massive, fs corruption. I'm unable to find a pattern yet which triggers this behaviour. It always seems to happen with medium/high load and I/O (disk + net). The weird thing is what is getting corrupted. 
Eg.:
I have /,/var,/home and /usr as seperate partitions (on the same disk) and /tmp as mfs. After doing some I/O on /home the box paniced and after the fsck
/var/log (and ofter stuff in /var), some stuff in /usr (eg. /usr/libexec/getty) and much stuff from / were gone. In / /dev was missing complety as most of /etc. 

An accurate way to panic my box is to copy a large amount of from one cgd to another. (I've never managed to copy more that 8GB before the box panics.) But this normally only ends in unclean disks...

Normally I don't get a crash dump, but today I got one.
panic("blkfree: bad size");

(gdb) bt
#0  0x00000001 in ?? ()
#1  0xc0263c26 in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/src/sys/arch/i386/i386/machdep.c:769
#2  0xc01f6e55 in panic (
    fmt=0xc0321c18 "ÂÚãðþ]ű2\bE\032äW\022\030nݹêi\024Ö¢\rr\016·GÉ Í\215§;'\204)\205Á") at /usr/src/sys/kern/subr_prf.c:242
#3  0xc01920b2 in ffs_blkfree (ip=0xe85c14d0, bno=514, size=8192)
    at /usr/src/sys/ufs/ffs/ffs_alloc.c:1530
#4  0xc0198151 in ffs_truncate (v=0xe85c5ce4)
    at /usr/src/sys/ufs/ffs/ffs_inode.c:427
#5  0xc022b2e9 in VOP_TRUNCATE (vp=0xe85c05dc, length=0, flags=0,
    cred=0xc1cfff00, p=0xe858b9ac) at /usr/src/sys/kern/vnode_if.c:1490
#6  0xc01ae9fc in ufs_setattr (v=0xe85c5d84)
    at /usr/src/sys/ufs/ufs/ufs_vnops.c:441
#7  0xc022a917 in VOP_SETATTR (vp=0xe85c05dc, vap=0xe85c5dd4, cred=0xc1cfff00,
    p=0xe858b9ac) at /usr/src/sys/kern/vnode_if.c:388
#8  0xc0229636 in vn_open (ndp=0xe85c5e84, fmode=1026, cmode=420)
    at /usr/src/sys/kern/vfs_vnops.c:284
#9  0xc0224685 in sys_open (l=0xe855e7f8, v=0xe85c5f64, retval=0xe85c5f5c)
    at /usr/src/sys/kern/vfs_syscalls.c:1120
#10 0xc026e1e4 in syscall_plain (frame=0xe85c5fa8)
    at /usr/src/sys/arch/i386/i386/syscall.c:159

$NetBSD: ffs_alloc.c,v 1.70 2003/09/05 21:58:35 itojun Exp $
$NetBSD: ffs_inode.c,v 1.60 2003/08/07 16:34:30 agc Exp $
$NetBSD: ufs_vnops.c,v 1.109 2003/11/08 06:38:10 dbj Exp $
$NetBSD: subr_prf.c,v 1.93 2003/08/07 16:31:53 agc Exp $
$NetBSD: vfs_syscalls.c,v 1.201 2003/11/15 01:19:38 thorpej Exp $
$NetBSD: vfs_vnops.c,v 1.75 2003/10/15 11:29:01 hannken Exp $
$NetBSD: vnode_if.c,v 1.45 2003/08/07 16:32:05 agc Exp $
$NetBSD: machdep.c,v 1.543 2003/10/28 22:52:53 mycroft Exp $
$NetBSD: syscall.c,v 1.27 2003/10/31 03:28:13 simonb Exp $
$NetBSD: vm_machdep.c,v 1.112 2003/10/27 14:11:47 junyoung Exp $

(I can put the core for this kernel online.)

The controller is an onboard VIA controller (VIA Technologies VT8233 ATA100 controller).

$NetBSD: pciide_machdep.c,v 1.3 2003/10/30 21:19:54 fvdl Exp $
$NetBSD: pciide_common.c,v 1.2 2003/10/23 19:29:35 bouyer Exp $

I'm pretty sure that the disks and the cables are ok.
	
>How-To-Repeat:
	
>Fix:
unknown
	

>Release-Note:
>Audit-Trail:
>Unformatted: