Subject: kern/31516: NetBSD-current panics under high load
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Matthias Scheler <tron@colwyn.zhadum.de>
List: netbsd-bugs
Date: 10/08/2005 23:35:01
>Number:         31516
>Category:       kern
>Synopsis:       NetBSD-current panics under high load
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 08 23:35:00 +0000 2005
>Originator:     tron@colwyn.zhadum.de
>Release:        NetBSD 3.99.9 sources from 2005-10-05 or newer
>Organization:
Matthias Scheler                                  http://scheler.de/~matthias/
>Environment:
System: NetBSD lyssa.zhadum.org.uk 3.99.9 NetBSD 3.99.9 (LYSSA) #0: Tue Oct 4 09:35:16 BST 2005 tron@lyssa.zhadum.org.uk:/src/sys/compile/LYSSA i386
Architecture: i386
Machine: i386
>Description:
Kernels built from sources from 2005-10-05 or newer panic on my system
sooner or later. Running "build.sh" is enough to trigger a panic
like this:

uvm_fault(0xc0555da0, 0xcecb6000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0351ccc cs 8 eflags 10202 cr2 cecb6e60 ilevel 0
panic: trap
Begin traceback...
trap() at netbsd:trap+0x162
--- trap (number 6) ---
uvm_fault(cdc817ec,bfbf9000,0,1,cda31110) at netbsd:uvm_fault+0x108f
trap() at netbsd:trap+0x35b
--- trap (number 6) ---
0xbbb59f6c:
End traceback...
syncing disks... uvm_fault(0xc0555da0, 0xcecd3000, 0, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 2 eip c02fb065 cs 8 eflags 10282 cr2 cecd3ae8 ilevel 6
panic: trap
Begin traceback...
trap() at netbsd:trap+0x162
--- trap (number 6) ---
genfs_putpages(cdcffab0,cdcffab7,c0606000,1,c04659c0) at netbsd:genfs_putpages+0x629
VOP_PUTPAGES(cebf66a4,0,0,0,0) at netbsd:VOP_PUTPAGES+0x42
ffs_full_fsync(cdcffbe0,cda31110,cdcffb68,c02a5823,cdcffb70) at netbsd:ffs_full_fsync+0x322
ffs_fsync(cdcffbe0,2,cdcffc08,10001,c0465200) at netbsd:ffs_fsync+0x4f
VOP_FSYNC(cebf66a4,cc7a0a80,0,0,0) at netbsd:VOP_FSYNC+0x4e
ffs_sync(c278c000,2,cc7a0a80,cdc8c00c,cc7ab318) at netbsd:ffs_sync+0x239
sys_sync(cda31110,0,0,0,100) at netbsd:sys_sync+0xcd
vfs_shutdown(c048d7fe,5,0,0,cdcffcd4) at netbsd:vfs_shutdown+0x74
cpu_reboot(100,0,fffd,c048691a,c02c6606) at netbsd:cpu_reboot+0x141
panic(c0494054,6,0,c0351ccc,8) at netbsd:panic+0x11b
trap() at netbsd:trap+0x162
--- trap (number 6) ---
uvm_fault(cdc817ec,bfbf9000,0,1,cda31110) at netbsd:uvm_fault+0x108f
trap() at netbsd:trap+0x35b
--- trap (number 6) ---
0xbbb59f6c:
End traceback...

It looks like a cascade of problems happened here causing multiple panics.
Here is what "gdb" gets out of the crash dump:

(gdb) where
#0  0xc065c000 in ?? ()
#1  0xc037074a in cpu_reboot (howto=260, bootstr=0x0)
    at /usr/src/sys/arch/i386/i386/machdep.c:752
#2  0xc02c5b4c in panic (fmt=0xc0494054 "trap")
    at /usr/src/sys/kern/subr_prf.c:244
#3  0xc03795f2 in trap (frame=0xcdcff8f0)
    at /usr/src/sys/arch/i386/i386/trap.c:296
#4  0xc010aec9 in calltrap ()
#5  0xc02f8251 in VOP_PUTPAGES (vp=0xcebf66a4, offlo=0, offhi=0, flags=17)
    at /usr/src/sys/kern/vnode_if.c:2015
#6  0xc024da26 in ffs_full_fsync (v=0xcdcffbe0)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:383
#7  0xc024d401 in ffs_fsync (v=0xcdcffbe0)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:277
#8  0xc02f7b5d in VOP_FSYNC (vp=0xcebf66a4, cred=0xcc7a0a80, flags=0, offlo=0, 
    offhi=0, p=0xcdc8c00c) at /usr/src/sys/kern/vnode_if.c:782
#9  0xc024b3e3 in ffs_sync (mp=<incomplete type>, waitfor=2, cred=0xcc7a0a80, 
    p=0xcdc8c00c) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1332
#10 0xc02efc7e in sys_sync (l=0xcda31110, v=0x0, retval=0x0)
    at /usr/src/sys/kern/vfs_syscalls.c:653
#11 0xc02ede4b in vfs_shutdown () at /usr/src/sys/kern/vfs_subr.c:2222
#12 0xc037075e in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/src/sys/arch/i386/i386/machdep.c:738
#13 0xc02c5b4c in panic (fmt=0xc0494054 "trap")
    at /usr/src/sys/kern/subr_prf.c:244
#14 0xc03795f2 in trap (frame=0xcdcffd94)
    at /usr/src/sys/arch/i386/i386/trap.c:296
#15 0xc010aec9 in calltrap ()
#16 0xc03797eb in trap (frame=0xcdcfffa8)
    at /usr/src/sys/arch/i386/i386/trap.c:583

Two different NetBSD users (one on "current-users@NetBSD.org" and one in
private e-mail) confirmed that they are seeing the same problem.

My system is a 2.4GHz Pentium IV with 2GB of memory running an uniprocessor
kernel. It uses FFSv1 with soft dependences enabled on all partitions.
I haven't touched the hardware in month and it works fine with a
2005-10-04 kernel (e.g. compiling a lot of packages).

>How-To-Repeat:
Run a "./build.sh -j 2" on a FFSv1 filesystem.

>Fix:
None provided.