Subject: kern/28730: lfs related kernel panics and page fault traps
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <ctribo@dtcc.edu>
List: netbsd-bugs
Date: 12/21/2004 03:03:01
>Number:         28730
>Category:       kern
>Synopsis:       lfs related kernel panics and page fault traps
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Dec 21 03:03:00 +0000 2004
>Originator:     Chris Tribo
>Release:        2.99.11
>Organization:
>Environment:
NetBSD foobar.dtcc.edu 2.99.11 NetBSD 2.99.11 (GENERIC.MPACPI.DEBUG) #0: Mon Dec 20 09:19:22 EST 2004  blah@blah.dtcc.edu:/usr/obj/sys/arch/i386/compile/GENERIC.MPACPI.DEBUG i386
>Description:
I created a new directory on my lfs partition, went to go add a user 
account and the system paniced. This can be triggered by just making a 
directory and typing sync. The fs was just make with newfs_lfs with no 
options and with -b 8192 -f 8192 as was once suggested to work around a 
file coalescing problem.

uvm_fault(0xc0a20cc0, 0xc5ed1000, 0, 2) -> 0xe
kernel: page fault trap, code=0
Stopped in pid 16.1 (ioflush) at netbsd:lfs_update_single+0x5da: movl
%
eax,0(%edx,%ecx,1)
db{0}>

This is GENERIC.MPACPI from sources as of 0900 -5 today.

I had to set ddb.onpanic=0 and get a crash dump since USB keyboards 
don't appear to work in ddb, and this machine has no legacy ports.

The panic line is panic: lfs_updatemeta: fragment is not last block.

Then I booted a kernel with debug, diagnostic and -g added to it. Now 
savecore says

savecore: reboot after panic: panic: kernel diagnostic assertion 
"sp->vp == NULL" failed: file " /usr/src/sys/ufs/lfs/lfs_segment.c", 
line 1092

GNU gdb 5.3nb1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and 
you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
This GDB was configured as "i386--netbsdelf"...
(gdb) target kcore /usr/crash/netbsd.1.core
panic: kernel %sassertion "%s" failed: file "%s", line %d
#0  0x00000000 in ?? ()
(gdb) where
#0  0x00000000 in ?? ()
#1  0xc0a75000 in ?? ()
#2  0xc0433df5 in cpu_reboot (howto=260, bootstr=0x0)
     at /usr/src/sys/arch/i386/i386/machdep.c:751
#3  0xc03961f4 in panic (
     fmt=0xc07b61c0 "kernel %sassertion \"%s\" failed: file \"%s\", line 
%d")
     at /usr/src/sys/kern/subr_prf.c:242
#4  0xc05e0f38 in __assert (t=0xc0713128 "diagnostic ",
     f=0xc0773fa0 "/usr/src/sys/ufs/lfs/lfs_segment.c", l=1092,
     e=0xc07237f0 "sp->vp == NULL") at 
/usr/src/sys/lib/libkern/__assert.c:47
#5  0xc03314fd in lfs_gather (fs=0xc3117800, sp=0xce890000, 
vp=0xce6563f8,
     match=0xc0333778 <lfs_match_data>)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:1177
#6  0xc0330763 in lfs_writefile (fs=0xc3117800, sp=0xce890000, 
vp=0xce6563f8)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:753
#7  0xc032fe1a in lfs_writevnodes (fs=0xc3117800, mp=<incomplete type>,
     sp=0xce890000, op=1) at /usr/src/sys/ufs/lfs/lfs_segment.c:498
#8  0xc033052d in lfs_segwrite (mp=<incomplete type>, flags=5)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:582
#9  0xc033a623 in lfs_sync (mp=0xc2fbf000, waitfor=2, cred=0xccc8d000,
     p=0xcd5f2e5c) at /usr/src/sys/ufs/lfs/lfs_vfsops.c:1473
#10 0xc03c16f2 in sys_sync (l=0xcd5c99cc, v=0x0, retval=0x0)
     at /usr/src/sys/kern/vfs_syscalls.c:625
#11 0xc03bfa63 in vfs_shutdown () at /usr/src/sys/kern/vfs_subr.c:2693
#12 0xc0433e09 in cpu_reboot (howto=256, bootstr=0x0)
     at /usr/src/sys/arch/i386/i386/machdep.c:737
#13 0xc03961f4 in panic (
     fmt=0xc07b61c0 "kernel %sassertion \"%s\" failed: file \"%s\", line 
%d")
     at /usr/src/sys/kern/subr_prf.c:242
#14 0xc05e0f38 in __assert (t=0xc0713128 "diagnostic ",
     f=0xc0773fa0 "/usr/src/sys/ufs/lfs/lfs_segment.c", l=1219,
     e=0xc072381b "daddr <= LFS_MAX_DADDR")
     at /usr/src/sys/lib/libkern/__assert.c:47
#15 0xc0331a6e in lfs_update_single (fs=0xc3117800, sp=0xce890000,
     vp=0xce656740, lbn=0, ndaddr=2259, size=1024)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:1287
#16 0xc0331f7c in lfs_updatemeta (sp=0xce890000)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:1425
#17 0xc03314c1 in lfs_gather (fs=0xc3117800, sp=0xce890000, 
vp=0xce656740,
     match=0xc0333778 <lfs_match_data>)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:1173
#18 0xc0330763 in lfs_writefile (fs=0xc3117800, sp=0xce890000, 
vp=0xce656740)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:753
#19 0xc032fe1a in lfs_writevnodes (fs=0xc3117800, mp=<incomplete type>,
     sp=0xce890000, op=1) at /usr/src/sys/ufs/lfs/lfs_segment.c:498
#20 0xc033052d in lfs_segwrite (mp=<incomplete type>, flags=5)
     at /usr/src/sys/ufs/lfs/lfs_segment.c:582
#21 0xc033a623 in lfs_sync (mp=0xc2fbf000, waitfor=2, cred=0xccc8d000,
     p=0xcd5f2e5c) at /usr/src/sys/ufs/lfs/lfs_vfsops.c:1473
#22 0xc03c16f2 in sys_sync (l=0xcd5c99cc, v=0xcd63ff64, 
retval=0xcd63ff5c)
     at /usr/src/sys/kern/vfs_syscalls.c:625
#23 0xc043e001 in syscall_plain (frame=0xcd63ffa8)
     at /usr/src/sys/arch/i386/i386/syscall.c:161
(gdb)

Do I need GDB 6 to get a trace of another CPU since this is an MP 
kernel? Let me know if there's any other information you need.
>How-To-Repeat:
run generic.mpacpi, create a folder on an lfs slice, type sync.
>Fix:
Christos mentioned here http://mail-index.netbsd.org/current-users/2004/12/20/0011.html that it could be due to switching to unsigned ints for ufs code that could be causing this. I'm not sure if CPU 2 is doing anything interesting so it could also be MP related.

I can make access to the box or core available upon request.