Subject: kern/23494: panic results in (massive) fs corruption
To: None <gnats-bugs@gnats.netbsd.org>
From: Marc Recht <recht@netbsd.org>
List: netbsd-bugs
Date: 11/19/2003 14:27:40
>Number:         23494
>Category:       kern
>Synopsis:       panic results in (massive) fs corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 19 17:45:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Marc Recht
>Release:        NetBSD 1.6ZF
>Organization:
	<organization of PR author (multiple lines)>
>Environment:
	<The following information is extracted from your kernel. Please>
	<append output of "ldd", "ident" where relevant (multiple lines).>
System: NetBSD leeloo.intern.geht.de 1.6ZF NetBSD 1.6ZF (LEELOO) #0: Tue=20
Nov 18 23:28:17 CET 2003=20
marc@leeloo.intern.geht.de:/usr/src/sys/arch/i386/compile/LEELOO i386
Architecture: i386
Machine: i386
>Description:
For a while I'm suffering from panics which result in, sometimes massive,=20
fs corruption. I'm unable to find a pattern yet which triggers this=20
behaviour. It always seems to happen
with medium/high load and I/O (disk + net). The weird thing is what is=20
getting corrupted.
Eg.:
I have /,/var,/home and /usr as seperate partitions (on the same disk) and=20
/tmp as mfs. After doing some I/O on /home the box paniced and after the=20
fsck
/var/log (and ofter stuff in /var), some stuff in /usr (eg.=20
/usr/libexec/getty) and much stuff from / were gone. In / /dev was missing=20
complety as most of /etc.

An accurate way to panic my box is to copy a large amount of from one cgd=20
to another. (I've never managed to copy more that 8GB before the box=20
panics.) But this normally only ends
 in unclean disks...

Normally I don't get a crash dump, but today I got one.
panic("blkfree: bad size");

(gdb) bt
#0  0x00000001 in ?? ()
#1  0xc0263c26 in cpu_reboot (howto=3D256, bootstr=3D0x0)
    at /usr/src/sys/arch/i386/i386/machdep.c:769
#2  0xc01f6e55 in panic (
    fmt=3D0xc0321c18 =
"=C2=DA=E3=F0=FE]=C5=B12\bE\032=E4W\022\030n=DD=B9=EAi\024=D6=A2\rr\016=B7G=C9=
=20
=CD\215=A7;'\204)\205=C1") at /usr/src/sys/kern/subr_prf.c:242
#3  0xc01920b2 in ffs_blkfree (ip=3D0xe85c14d0, bno=3D514, size=3D8192)
    at /usr/src/sys/ufs/ffs/ffs_alloc.c:1530
#4  0xc0198151 in ffs_truncate (v=3D0xe85c5ce4)
    at /usr/src/sys/ufs/ffs/ffs_inode.c:427
#5  0xc022b2e9 in VOP_TRUNCATE (vp=3D0xe85c05dc, length=3D0, flags=3D0,
    cred=3D0xc1cfff00, p=3D0xe858b9ac) at /usr/src/sys/kern/vnode_if.c:1490
#6  0xc01ae9fc in ufs_setattr (v=3D0xe85c5d84)
    at /usr/src/sys/ufs/ufs/ufs_vnops.c:441
#7  0xc022a917 in VOP_SETATTR (vp=3D0xe85c05dc, vap=3D0xe85c5dd4,=20
cred=3D0xc1cfff00,
    p=3D0xe858b9ac) at /usr/src/sys/kern/vnode_if.c:388
#8  0xc0229636 in vn_open (ndp=3D0xe85c5e84, fmode=3D1026, cmode=3D420)
    at /usr/src/sys/kern/vfs_vnops.c:284
#9  0xc0224685 in sys_open (l=3D0xe855e7f8, v=3D0xe85c5f64, =
retval=3D0xe85c5f5c)
    at /usr/src/sys/kern/vfs_syscalls.c:1120
#10 0xc026e1e4 in syscall_plain (frame=3D0xe85c5fa8)
    at /usr/src/sys/arch/i386/i386/syscall.c:159

$NetBSD: ffs_alloc.c,v 1.70 2003/09/05 21:58:35 itojun Exp $
$NetBSD: ffs_inode.c,v 1.60 2003/08/07 16:34:30 agc Exp $
$NetBSD: ufs_vnops.c,v 1.109 2003/11/08 06:38:10 dbj Exp $
$NetBSD: subr_prf.c,v 1.93 2003/08/07 16:31:53 agc Exp $
$NetBSD: vfs_syscalls.c,v 1.201 2003/11/15 01:19:38 thorpej Exp $
$NetBSD: vfs_vnops.c,v 1.75 2003/10/15 11:29:01 hannken Exp $
$NetBSD: vnode_if.c,v 1.45 2003/08/07 16:32:05 agc Exp $
$NetBSD: machdep.c,v 1.543 2003/10/28 22:52:53 mycroft Exp $
$NetBSD: syscall.c,v 1.27 2003/10/31 03:28:13 simonb Exp $
$NetBSD: vm_machdep.c,v 1.112 2003/10/27 14:11:47 junyoung Exp $

(I can put the core for this kernel online.)

The controller is an onboard VIA controller (VIA Technologies VT8233 ATA100 =

controller).

$NetBSD: pciide_machdep.c,v 1.3 2003/10/30 21:19:54 fvdl Exp $
$NetBSD: pciide_common.c,v 1.2 2003/10/23 19:29:35 bouyer Exp $

I'm pretty sure that the disks and the cables are ok.
	<precise description of the problem (multiple lines)>
>How-To-Repeat:
unknown
	<code/input/activities to reproduce the problem (multiple lines)>
>Fix:
unknown
	<how to correct or work around the problem, if known (multiple lines)>

>Release-Note:
>Audit-Trail:
>Unformatted:
 	<Please check that the above is correct for the bug being reported,>
 	<and append source date of snapshot, if applicable (one line).>